Topics

New daos setup


BASDEN, ALASTAIR G.
 

Hi,

I am trying to set up a new daos system, and could do with a bit of help.

It is a simple test setup, with a single server.

The server has 2 CPUs, each with DCPM non-volatile memory attached, 1.6TB per CPU. (12x 256GB DIMMs).

And 12x NVMe drives, each 1.6TB.

My /etc/daos/daos_server.yml file looks like (the server name is daos):
name: daos1
access_points: ['daos']
transport_config:
allow_insecure: true
client_cert_dir: /etc/daos/daosCA/certs
ca_cert: /etc/daos/daosCA/certs/daosCA.crt
cert: /etc/daos/daosCA/certs/server.crt
key: /etc/daos/daosCA/certs/server.key
fabric_ifaces: [ib0]
provider: ofi+verbs;ofi_rxm
servers:
-
rank: 0
targets: 6
first_core: 0
fabric_iface: ib0
fabric_iface_port: 20000
scm_mount: /mnt/daos/1
scm_class: dcpm
scm_list: [/dev/pmem0]
bdev_class: nvme
rank: 1
targets: 6
first_core: 24
fabric_iface: ib0
fabric_iface_port: 20001
scm_mount: /mnt/daos/2
scm_class: dcpm
scm_list: [/dev/pmem1]
bdev_class: nvme


To start it, I am running "daos_server". "dmg -i storage scan" then returns:
daos:10001: connected
Hosts SCM Total NVMe Total
----- --------- ----------
daos 3.2 TB (2 namespaces) 19 TB (12 controllers)



First question - what are the rank 0 and rank 1? Can I do it all in a single rank (I read that scm_list can only have 1 entry, hence the reason for putting 2 ranks).

If the 2 ranks is correct, do I need to do something to start another one? Currently, the only /dev/pmem1 is mounted, not /dev/pmem0. So it might seem that rank:0 is being ignored.

I then run "daos_agent", and the daos_agent.yml file has:
name: daos1
access_points: ['daos']
ca_cert: /etc/daos/daosCA/certs/daosCA.crt
cert: /etc/daos/daosCA/certs/agent.crt
key: /etc/daos/daosCA/certs/agent.key

If I think run:
dfuse --svc=0 --mountpoint=/mnt/daosusermnt
it returns without any complaints, but I don't have anything mounted.

How do I proceed from here to get it mounted as a file system?

Thanks,
Alastair.


Jacque, Kristin
 

Hi Alastair,

Running with two ranks is required if you want to use all your DCPM, in your case. The assumption is that each rank is running on a particular CPU, and uses the storage associated with that CPU.

I can offer a few tips for getting going as well.

First: if you're using "allow_insecure: true", you should ensure it's set in all of the yml config files, and comment out or remove all of the other CA/certificate-related options. If this is inconsistent amongst the different components, they'll be unable to talk to each other.

In the daos_server.yml, you may also need to add the bdev_list of NVMe addresses for the SSDs for each rank. Not sure if DAOS can auto-populate that yet if it's not supplied by the config. You can try without it, but if that doesn't work, you may need to add them to the server config and reformat.

Are you doing a dmg storage format after you start the daos_server? It should be necessary the first time you start up. This will set up your SCM mount points. If you haven't done this yet, it's probably why your ranks haven't started up. You might have to unmount your SCM beforehand if it's already mounted.

Hopefully this will help you get started.

Thanks,
Kris

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of BASDEN, ALASTAIR G.
Sent: Wednesday, January 27, 2021 10:23 AM
To: daos@daos.groups.io
Subject: [daos] New daos setup

Hi,

I am trying to set up a new daos system, and could do with a bit of help.

It is a simple test setup, with a single server.

The server has 2 CPUs, each with DCPM non-volatile memory attached, 1.6TB per CPU. (12x 256GB DIMMs).

And 12x NVMe drives, each 1.6TB.

My /etc/daos/daos_server.yml file looks like (the server name is daos):
name: daos1
access_points: ['daos']
transport_config:
allow_insecure: true
client_cert_dir: /etc/daos/daosCA/certs
ca_cert: /etc/daos/daosCA/certs/daosCA.crt
cert: /etc/daos/daosCA/certs/server.crt
key: /etc/daos/daosCA/certs/server.key
fabric_ifaces: [ib0]
provider: ofi+verbs;ofi_rxm
servers:
-
rank: 0
targets: 6
first_core: 0
fabric_iface: ib0
fabric_iface_port: 20000
scm_mount: /mnt/daos/1
scm_class: dcpm
scm_list: [/dev/pmem0]
bdev_class: nvme
rank: 1
targets: 6
first_core: 24
fabric_iface: ib0
fabric_iface_port: 20001
scm_mount: /mnt/daos/2
scm_class: dcpm
scm_list: [/dev/pmem1]
bdev_class: nvme


To start it, I am running "daos_server". "dmg -i storage scan" then returns:
daos:10001: connected
Hosts SCM Total NVMe Total
----- --------- ----------
daos 3.2 TB (2 namespaces) 19 TB (12 controllers)



First question - what are the rank 0 and rank 1? Can I do it all in a single rank (I read that scm_list can only have 1 entry, hence the reason for putting 2 ranks).

If the 2 ranks is correct, do I need to do something to start another one?
Currently, the only /dev/pmem1 is mounted, not /dev/pmem0. So it might seem that rank:0 is being ignored.

I then run "daos_agent", and the daos_agent.yml file has:
name: daos1
access_points: ['daos']
ca_cert: /etc/daos/daosCA/certs/daosCA.crt
cert: /etc/daos/daosCA/certs/agent.crt
key: /etc/daos/daosCA/certs/agent.key

If I think run:
dfuse --svc=0 --mountpoint=/mnt/daosusermnt it returns without any complaints, but I don't have anything mounted.

How do I proceed from here to get it mounted as a file system?

Thanks,
Alastair.