Date   

Startup Errors

Petrillo, Neale A. (Contractor) <Neale.Petrillo@...>
 

Hello Group! 

I'm having some trouble getting my new DAOS cluster working. I've installed 6 servers all with the 1.0.1 RPMs. When I do a 'dmg storage format' from my test host, I get the following output:

 

[root@head ~]# dmg -i -l <host01>:10001 storage format

ERROR: <host01>:10001: socket connection is not active (TRANSIENT_FAILURE)

ERROR: dmg: no active connections

[root@head ~]# dmg -i -l <host01> system query

ERROR: <host01>:10001: socket connection is not active (TRANSIENT_FAILURE)

ERROR: dmg: no active connections

 

I'm also seeing these errors in the log files:

 

INFO 2021/02/18 10:40:15 DAOS I/O Server instance 0 storage not ready: context canceled

INFO 2021/02/18 10:40:19 SCM format required on instance 1

INFO 2021/02/18 10:40:19 DAOS I/O Server instance 1 storage not ready: context canceled

INFO 2021/02/18 10:40:19 DAOS Control Server (pid 9993) shutting down

ERROR 2021/02/18 10:40:54 /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB

INFO 2021/02/18 10:41:00 DAOS Control Server (pid 11507) listening on 0.0.0.0:10001

INFO 2021/02/18 10:41:00 Waiting for DAOS I/O Server instance storage to be ready...

INFO 2021/02/18 10:41:04 SCM format required on instance 0

 

Configuration files are attached. Any help would be appreciated! 

Neale



unable to create pool

asharma@...
 

Hi team,
So I have a basic set of daos client and server running. when I run dmg storage query commands on the client-side, I get responses, it means I am able to interact with the server. But when i run this pool create command: -
dmg -i pool create -s 1G -n 10G -g root -u root -S daos_server. the response is:
Creating DAOS pool with manual per-server storage allocation: 1.0 GB SCM, 10 GB NVMe (10.00% ratio)
It stays like this for a really long time then I get an error.
ERROR: dmg: context deadline exceeded

I am configuring the server with the daos_server_local.xml config file. This means my SCM is set to ram and my NVMe is set to file to emulate the scm and NVMe. I thought we can still create pools on the emulated storage. Am I doing something wrong? Can I not create pools on emulated SCM? Because without creating a pool I can not really do anything else. 
 
I am attaching a screenshot of the same error for reference. Please let me know what could I be possibly doing wrong. Or if creating pools on emulated SCM storage is not supported. 


Re: i/O Timeout on dmg storage query

Lombardi, Johann
 

Hi,

 

Maybe you have a firewall running on the VM? You could maybe try to run the containers with --network=host to use the host network.

Please note that you will also have to open up ports 31416 and 31417 for the engine/io_server to process incoming requests.

 

On my side, when I run multiple docker containers on the same node, I create a daosnet network (i.e. docker network create -d bridge daosnet) and then add “--network daosnet” to each “docker run” command line. That being said, I haven’t tried to run the containers on different nodes/VMs yet.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "asharma@..." <asharma@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 10 February 2021 at 19:45
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] i/O Timeout on dmg storage query

 

[Edited Message Follows]

Hi Team,
I have a simple setup of 2 VM on the cloud. I have DAOS set up on both of them as Docker containers. One of the containers is running a server which I started with a daos_server_local.yml file. Just changed the access_point to the domain name of my VM host running the server. 
On the other Container, I have the DAOS agent installed with the sample daos_agent.yml config. I have also set up the daos_control.yml to have a host list with the server VM's domain name. My architecture looks something like the below picture. 



on running a simple query: dmg storage query usage
I am running the above query from inside the container running on the client VM

I get the following error:

Errors:

  Hosts                              Error                                                                                                                                      

  -----                              -----                                                                                                                                      

  letldaos.eastus.cloudapp.azure.com rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp 138.91.118.108:10001: i/o timeout" 


Note: The same query command runs absolutely fine if I run a daos server inside the client container with the access_point set to 'localhost'
I guess I am missing something really small on my part. In the beginning, I thought it was just that I do no have the port:10001 exposed to my host from inside my container. I tried running the container with -p 10001:10001 but still, it gives the same error. I am kind of out of ideas to try things here. Can someone please suggest to me a possible solution?


I am attaching my config files for reference. I have set allow_insecure: true in all 3 files

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Questions on DTX and KV Puts values visibility

Yong, Fan
 

The “committable” means the DTX status, that controls related data visibility. Before the DTX to be committable, even if related data is already written to persistent storage, it is still invisible to clients. The DTX will become committable only when the leader executes related modification locally and get succeed replies from all related non-leader replicas. Once the DTX is committable on the leader, for async DTX (by default), the IO handler (ULT) will reply the client immediately. At that time, the DTX status on non-leader replicas is ‘prepared’. After that, sometime later, another ULT (the async batched commit ULT) on the leader will send DTX commit RPC to non-leader replicas that will persistently change the DTX status on all replicas. My former comment about “commit to persistent storage” means this step.

 

--

Cheers,

Nasf

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ping.wong via groups.io
Sent: Tuesday, February 9, 2021 4:39 PM
To: daos@daos.groups.io
Subject: Re: [daos] Questions on DTX and KV Puts values visibility

 

Hi Nasf,

You mentioned that when the leader reply the client, related put is committable on server, but not committed to persistent storage. 
However, I found that the Put has already been written to SSD via bio_rw for both leader and replica in earlier steps.  Please explain the asynchronous commit after to leader reply to client.  Are there any RPCs that the leader has to send to the replica indicating that the DTX is finally committed during the asynchronous commit phase?  How do the replica and server agree that they are both committed the DTX?

Ping
 


Re: Questions on DTX and KV Puts values visibility

ping.wong@...
 

Hi Nasf,

You mentioned that when the leader reply the client, related put is committable on server, but not committed to persistent storage. 
However, I found that the Put has already been written to SSD via bio_rw for both leader and replica in earlier steps.  Please explain the asynchronous commit after to leader reply to client.  Are there any RPCs that the leader has to send to the replica indicating that the DTX is finally committed during the asynchronous commit phase?  How do the replica and server agree that they are both committed the DTX?

Ping
 


Re: Questions on DTX and KV Puts values visibility

Yong, Fan
 

Then how to guarantee that the read request is sent out after the write request is replied? If the read value is old, how to know whether is the expected or not?

In your case, what the DAOS should (and can) guarantee is that some thread first read get value ‘a’, and then the same thread read the same key for the second time, get value ‘b’, if ‘b’ is not the same as ‘a’, then ‘b’ must be newer than ‘a’.

 

--

Cheers,

Nasf

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ping.wong via groups.io
Sent: Tuesday, February 9, 2021 3:37 PM
To: daos@daos.groups.io
Subject: Re: [daos] Questions on DTX and KV Puts values visibility

 

Hi Nasf,

ReadThreadFunc thread 0 and  WriteThreadFunc thread 0 are two separate threads.  All threads are running in parallel without any control.

Ping


Re: Questions on DTX and KV Puts values visibility

ping.wong@...
 

Hi Nasf,

ReadThreadFunc thread 0 and  WriteThreadFunc thread 0 are two separate threads.  All threads are running in parallel without any control.

Ping


Re: Questions on DTX and KV Puts values visibility

Yong, Fan
 

Hi Ping,

 

Does “ReadThreadFunc thread 0” is the same thread as “WriteThreadFunc thread 0”? Or they are two different threads? Is there any concurrency control among the read threads and write threads? Or all the threads run in parallel without any control?

 

More inline comments.

 

--

Cheers,

Nasf

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ping.wong via groups.io
Sent: Tuesday, February 9, 2021 1:50 PM
To: daos@daos.groups.io
Subject: [daos] Questions on DTX and KV Puts values visibility

 

Hi All,

 

In a two nodes cluster (one master/leader and one replica), I wrote a client application with multiple writer and reader threads against same object with different keys and values:

1. Writers issue daos_kv_put(oh, DAOS_TX_NONE, ..., &ev); call daos_event_test(&ev, DAOS_EQ_WAIT, &ev_flag);                // wait for IO completion

2. Readers issue daos_kv_get(oh, DAOS_TX_NONE, 0, key, &size, buf, &ev); call daos_event_test(&ev, DAOS_EQ_WAIT, &ev_flag); // wait for IO completion

 

I observed that daos_kv_get returns older version occasionally.  I have a few questions regarding DTX and the visibility of values obtained from the client's perspective.  

 

1. On the master/leader server, when it sends RPC reply to the client for the daos_kv_put(); 

     1.1 Has the put been committed before returning to the client?  If not, what is the typical asynchronous commit threshold to make the put values visible to other clients? 

[Nasf] By default, when the leader server reply the client, related update (put) is committable on server, but not really committed to the persistent storage. That is the typical asynchronously commit. But even if it is async commit, related update (put) is still visible to other clients as long as the leader server replied to the sponsored client. That is nothing related with the real commit to persistent storage. But if there is not communication between clients, then it is not easy to guarantee that the read on one client is after the write on another client.


     1.2 Under what condition(s), does the leader commit the transaction?

[Nasf] For async commit, there is dedicated ULT that will batched commit the committable DTX entries periodically. Two conditions: the committable DTX entries count exceeds the threshold or some DTX entries become too old.


     1.3 Does the leader send RPC to replica to commit the transaction synchronously or asynchronously after replying to the client?

[Nasf] The DTX batched commit ULT (not the IO handler) on the leader will async commit DTX entries after the leader replied related clients.

   

2. On the master server, how long does it wait (timeout) for the replica to reply?

[Nasf] It is the CaRT timeout, 60 seconds by default.

 
    2.1 Does the master server send RPC to replica if the leader does not get reply from replica (e.g. when replica dies)?

[Nasf] If replica (non-leader) dead, then the leader will get timeout, then related update (put) will fail.


    2.2 Under what condition(s) does the master server commit synchronously?

[Nasf] The application can require the leader to synchronously commit related update (put) via some RPC flags. Or when the leader cannot make async commit, it will synchronously commit related DTX.


The following is a partial output from the client application:

 

ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000001

ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000001

ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000001

ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005

ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005

WriteThreadFunc thread 0 put key=keyabcde00000004 buf=Value00000003                             Put older version <--------+

WriteThreadFunc thread 1 put key=keyabcde00000006 buf=Value00000003                                                                     |

WriteThreadFunc thread 2 put key=keyabcde00000007 buf=Value00000004                                                                     |

ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000002                                                                           |

ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000002                                                                           |

ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000002                                                                           |

ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000002                                                                           |

ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005                                                                           |

ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005                                                                           |

WriteThreadFunc thread 0 put key=keyabcde00000004 buf=Value00000004                                  Put newer version      |

WriteThreadFunc thread 1 put key=keyabcde00000006 buf=Value00000004                                                                     |

WriteThreadFunc thread 2 put key=keyabcde00000007 buf=Value00000005                                                                     |

ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000003                                                                           |

ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000003                                                                           |

ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000003                                  Get older version  <--------+

ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000003

 

                                            ...

                        

WriteThreadFunc thread 2 put key=keyabcde00000003 buf=Value00000005                               Put older version <---------------------------+

Event Write thread 0 started buf_size=4096 start=1 end=5                                                                                                                           |

WriteThreadFunc thread 1 put key=keyabcde00000002 buf=Value00000004                                                                                              |

Event Read thread 2 started buf_size=4096 start=3 end=7                                                                                                                          |

WriteThreadFunc thread 2 put key=keyabcde00000004 buf=Value00000001                                                                                              |

Event Read thread 1 started buf_size=4096 start=2 end=6                                                                                                                          |

WriteThreadFunc thread 1 put key=keyabcde00000002 buf=Value00000005                                                                                              |

WriteThreadFunc thread 1 put key=keyabcde00000003 buf=Value00000001       Put older version <----+     Put newer version                |

ReadThreadFunc thread 2 key=keyabcde00000003 buf=Value00000005                                                |   Get older version  <-------------+

ReadThreadFunc thread 2 key=keyabcde00000003 buf=Value00000005                                                |

ReadThreadFunc thread 0 key=keyabcde00000001 buf=Value00000001                                                |

ReadThreadFunc thread 0 key=keyabcde00000001 buf=Value00000001                                                |

ReadThreadFunc thread 1 key=keyabcde00000002 buf=Value00000005                                                |

ReadThreadFunc thread 1 key=keyabcde00000002 buf=Value00000005                                                |

WriteThreadFunc thread 0 put key=keyabcde00000001 buf=Value00000002                                          |

WriteThreadFunc thread 2 put key=keyabcde00000004 buf=Value00000003                                          |

WriteThreadFunc thread 1 put key=keyabcde00000003 buf=Value00000002           Put newer version  |

ReadThreadFunc thread 2 key=keyabcde00000003 buf=Value00000001             Get older version ----+

 


Thanks 

Ping

 


Questions on DTX and KV Puts values visibility

ping.wong@...
 

Hi All,
 
In a two nodes cluster (one master/leader and one replica), I wrote a client application with multiple writer and reader threads against same object with different keys and values:
1. Writers issue daos_kv_put(oh, DAOS_TX_NONE, ..., &ev); call daos_event_test(&ev, DAOS_EQ_WAIT, &ev_flag);                // wait for IO completion
2. Readers issue daos_kv_get(oh, DAOS_TX_NONE, 0, key, &size, buf, &ev); call daos_event_test(&ev, DAOS_EQ_WAIT, &ev_flag); // wait for IO completion
 
I observed that daos_kv_get returns older version occasionally.  I have a few questions regarding DTX and the visibility of values obtained from the client's perspective.  
 
1. On the master/leader server, when it sends RPC reply to the client for the daos_kv_put(); 
     1.1 Has the put been committed before returning to the client?  If not, what is the typical asynchronous commit threshold to make the put values visible to other clients? 
     1.2 Under what condition(s), does the leader commit the transaction?
     1.3 Does the leader send RPC to replica to commit the transaction synchronously or asynchronously after replying to the client?
   
2. On the master server, how long does it wait (timeout) for the replica to reply? 
    2.1 Does the master server send RPC to replica if the leader does not get reply from replica (e.g. when replica dies)?
    2.2 Under what condition(s) does the master server commit synchronously?

The following is a partial output from the client application:
 
ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000001
ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000001
ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000001
ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005
ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005
WriteThreadFunc thread 0 put key=keyabcde00000004 buf=Value00000003                             Put older version <--------+
WriteThreadFunc thread 1 put key=keyabcde00000006 buf=Value00000003                                                                     |
WriteThreadFunc thread 2 put key=keyabcde00000007 buf=Value00000004                                                                     |
ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000002                                                                           |
ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000002                                                                           |
ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000002                                                                           |
ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000002                                                                           |
ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005                                                                           |
ReadThreadFunc thread 1 key=keyabcde00000005 buf=Value00000005                                                                           |
WriteThreadFunc thread 0 put key=keyabcde00000004 buf=Value00000004                                  Put newer version      |
WriteThreadFunc thread 1 put key=keyabcde00000006 buf=Value00000004                                                                     |
WriteThreadFunc thread 2 put key=keyabcde00000007 buf=Value00000005                                                                     |
ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000003                                                                           |
ReadThreadFunc thread 2 key=keyabcde00000006 buf=Value00000003                                                                           |
ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000003                                  Get older version  <--------+
ReadThreadFunc thread 0 key=keyabcde00000004 buf=Value00000003
 
                                            ...
                        
WriteThreadFunc thread 2 put key=keyabcde00000003 buf=Value00000005                               Put older version <---------------------------+
Event Write thread 0 started buf_size=4096 start=1 end=5                                                                                                                           |
WriteThreadFunc thread 1 put key=keyabcde00000002 buf=Value00000004                                                                                              |
Event Read thread 2 started buf_size=4096 start=3 end=7                                                                                                                          |
WriteThreadFunc thread 2 put key=keyabcde00000004 buf=Value00000001                                                                                              |
Event Read thread 1 started buf_size=4096 start=2 end=6                                                                                                                          |
WriteThreadFunc thread 1 put key=keyabcde00000002 buf=Value00000005                                                                                              |
WriteThreadFunc thread 1 put key=keyabcde00000003 buf=Value00000001       Put older version <----+     Put newer version                |
ReadThreadFunc thread 2 key=keyabcde00000003 buf=Value00000005                                                |   Get older version  <-------------+
ReadThreadFunc thread 2 key=keyabcde00000003 buf=Value00000005                                                |
ReadThreadFunc thread 0 key=keyabcde00000001 buf=Value00000001                                                |
ReadThreadFunc thread 0 key=keyabcde00000001 buf=Value00000001                                                |
ReadThreadFunc thread 1 key=keyabcde00000002 buf=Value00000005                                                |
ReadThreadFunc thread 1 key=keyabcde00000002 buf=Value00000005                                                |
WriteThreadFunc thread 0 put key=keyabcde00000001 buf=Value00000002                                          |
WriteThreadFunc thread 2 put key=keyabcde00000004 buf=Value00000003                                          |
WriteThreadFunc thread 1 put key=keyabcde00000003 buf=Value00000002           Put newer version  |
ReadThreadFunc thread 2 key=keyabcde00000003 buf=Value00000001             Get older version ----+
 

Thanks 
Ping
 


Re: Tutorial or Examples for Go management API

asharma@...
 

Hi,

I have tried running daos/src/control/cmd/drpc_test/main.go in order to run the hello. I thought that will be a simple test example to demo the control api. But it gives me an error -lgurt not found. I have built the CART git as well in my home directory. I also have built the entire DAOS source tree by running Sconstruct. I admit i am extremely new to the technology and Go language as well. I am trying to figure out things here as I work at a startup that plans to use DAOS as its major storage solution on SCMs


Tutorial or Examples for Go management API

asharma@...
 

Hi team,

I have been trying to figure out how to use the Go management API to talk to the server and then integrate that management API with my application. I had a few questions, answer to any of them will be really helpful: -

1)Are there any basic examples I can run or a guide where I can find how to use the management API to talk to the server? 
2) For now I have a simple server (with daos_server_local.yml ) configuration running in a container on a Azure VM. What else do I need to have to demo the basic functionality of management API? My server has no NVMe or SCM. I am using this setup for developing an initial version of an application that works on a server that is emulating SCM on DRAM.
I know there is a way to install the client with $ yum install daos client. but that is only for RPM installs. which is not working for me. 
3) I know there are different modules for client and server. The docker installation documentation just tells us to set up a basic server. How do I build and set up a client? The Sconsrtipt file in the client folder is called from the root folder only while I build the Sconstruct. How do I install a client or agent? My understanding is I need the client to use the Go management API. 
4) Can i achieve what I am trying to using my everyday laptop and a VM machine with DRAM and SSD interacting over http or do i need a specific hardware setup? 

Reagrds


Re: New DAOS Setup

asharma@...
 

Hi,

If you are just using the daos_server_local.yml config file and do not need an NVMe, just change the nr_hugepages: 0. This takes care of the error and starts the server. 


Re: Huge latency observed by DAOS client

Lombardi, Johann
 

Hi Kedar,

 

I assume that you are passing event = NULL to daos_kv_put() and only have one operation in flight, right?

In the “normal” case (i.e. no membership change, no restart), the bulk of the work has been done once dc_rw_cb() completes and daos_kv_put() should return shortly after that. Only some minor clean-ups (e.g. freeing up allocated memory) remain and this should definitely not take 0.72ms. Could you please profile the APP and find out where the bulk of the time is spent?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "k.patwardhan via groups.io" <k.patwardhan@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Saturday 6 February 2021 at 01:37
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Huge latency observed by DAOS client

 

Hello DAOS community,

 

 

I have a rather strange observation when working with DAOS. The DAOS client system reports way higher latency than I anticipated and am wondering if someone knows whether this is a known problem.

 

The diagram attached depicts my test setup.

 

1.      What I see is that the client issues a single Put request (16 byte key, 4 KB value) “invoking daos_kv_put()” to the DAOS server.

2.      The DAOS server (master) responds to the Put request in about 0.28 msec.

 

3.      However, the client reports an overall latency of 1 msec. Client latency is time taken by daos_kv_put() to complete.

 

Observations:

1. Sometimes daos_kv_put() returns before dc_rw_cb() callback is called. Is that normal ?

2. Often dc_rw_cb() gets called quickly but the client does some work related to the mercurial module before returning from daos_kv_put(). Any idea what is the client doing even after dc_rw_cb() got called ?

               2.1 My observation above of server responding in 0.28 msec but the client returning from daos_kv_put() after 1ms relates to this case. I believe dc_rw_db() gets called in response to the DAOS server finishing the Put request but the client still does some work that I don’t understand why.

 

The client seems to call some mercurial functions between the time the server responded back to it and before daos_kv_put() could return. Is that expected ? Also, any idea what’s going on in-between that increases the overall latency from 0,28msec to 1msec ?

 

Thanks in advance and looking forward to root causing the problem soon.

 

Regards,

Kedar

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Error in installing via Docker and help needed for integrating with REST API

asharma@...
 

Hi,
Yes I am running the container by the following command : $ sudo docker run -it -d --privileged --cap-add=ALL --name server -v /dev:/dev <Image ID>

The exact steps I am using for the entire installation are:- (steps 4-10 corresponds to the DAOS installation with docker)

  1. 1) Setup a VM on azure. (16 GiB DRAM) 

  2. 2 )Install docker on VM : https://docs.docker.com/engine/install/centos/ 

  3. 3) Install Git to the VM  

  4. 4) Git clone the master branch from github : $ git clone https://github.com/daos-stack/daos.git 

  5. 5) Enter the root directory of DAOS Project: $ cd daos 

  6. 6) Build the Docker image from the root directory : $ sudo docker build https://github.com/daos-stack/daos.git#master -f utils/docker/Dockerfile.centos.7 

  7. 7) Check if the image was created : $ sudo docker images 

  8. 8) Run a container with the newly created image: $ sudo docker run -it -d --privileged --cap-add=ALL --name server -v /dev:/dev <Image ID> 

  9. 9) Copy the project source tree into the container : $ sudo docker cp ./daos server:/home/daos 

  10. 10) Start a server inside the container : $ sudo docker exec server daos_server start \ 

  11.         -o /home/daos/daos/utils/config/examples/daos_server_local.yml 

    Note: Although step 9 is not mentioned in the documentation, without that I get an error: No such file or directory at /daos
    /utils/config/examples/daos_server_local.yml 


On Mon, Feb 8, 2021, at 05:06 AM, Lombardi, Johann wrote:

Hi,

 

The dmg format command actually connects to the server to initiate the format. In your case, the server failed to start, so the “refused connection” is expected. I assume that you are running the server container with --privileged --cap-add=ALL -v /dev:/dev as specified in the documentation, right?


Re: Error in installing via Docker and help needed for integrating with REST API

Nabarro, Tom
 

If NVMe is not required then try setting "nr_hugepages: 0" in the global section of the server configuration file.

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Monday, February 8, 2021 10:06 AM
To: daos@daos.groups.io
Subject: Re: [daos] Error in installing via Docker and help needed for integrating with REST API

 

Hi,

 

The dmg format command actually connects to the server to initiate the format. In your case, the server failed to start, so the “refused connection” is expected. I assume that you are running the server container with --privileged --cap-add=ALL -v /dev:/dev as specified in the documentation, right?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "asharma@..." <asharma@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Sunday 7 February 2021 at 14:46
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error in installing via Docker and help needed for integrating with REST API

 

Hi Johann,

Thank you for all the help. The master definitely worked. And I was able to get a container running. Upon running: sudo docker exec server daos_server start -o /home/daos/daos/utils/config/examples/daos_server_local.yml to start the server. I get the following errors: 

DAOS Server config loaded from /home/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

ERROR: server: code = 647 description = "requested 4096 hugepages; got 0"

ERROR: server: code = 647 resolution = "reboot the system or manually clear /dev/hugepages as appropriate"

I read on an older mail that these errors can be taken as warnings and can be ignored, the server would still run. is that right?
So I went ahead and tried to run: $ docker exec server dmg -i storage format.
But this command gives the following error: the server at localhost:10001 refused the connection

 

My understanding was if I use the daos_server_local.yml file, i do not need authentication as the config file sets the flag allow_insecure :true 

My target is to just have a basic daos server running and be able to use management API on it. so that I can integrate my client application to interact with the daos server using the management API

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Error in installing via Docker and help needed for integrating with REST API

Lombardi, Johann
 

Hi,

 

The dmg format command actually connects to the server to initiate the format. In your case, the server failed to start, so the “refused connection” is expected. I assume that you are running the server container with --privileged --cap-add=ALL -v /dev:/dev as specified in the documentation, right?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "asharma@..." <asharma@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Sunday 7 February 2021 at 14:46
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error in installing via Docker and help needed for integrating with REST API

 

Hi Johann,

Thank you for all the help. The master definitely worked. And I was able to get a container running. Upon running: sudo docker exec server daos_server start -o /home/daos/daos/utils/config/examples/daos_server_local.yml to start the server. I get the following errors: 

DAOS Server config loaded from /home/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

ERROR: server: code = 647 description = "requested 4096 hugepages; got 0"

ERROR: server: code = 647 resolution = "reboot the system or manually clear /dev/hugepages as appropriate"

I read on an older mail that these errors can be taken as warnings and can be ignored, the server would still run. is that right?
So I went ahead and tried to run: $ docker exec server dmg -i storage format.
But this command gives the following error: the server at localhost:10001 refused the connection

 

My understanding was if I use the daos_server_local.yml file, i do not need authentication as the config file sets the flag allow_insecure :true 

My target is to just have a basic daos server running and be able to use management API on it. so that I can integrate my client application to interact with the daos server using the management API

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Client application single value KV Put high latency using multiple threads (pthread)

Lombardi, Johann
 

Hi Ping,

 

The daos_agent is actually only needed on the client nodes (where the APP runs).

Please find attached some configuration files that I use on a ethernet network with the tcp provider.

I will collect some numbers later this week with “daos pool autotest” so that we can hopefully compare it with your results.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "ping.wong via groups.io" <ping.wong@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 4 February 2021 at 22:20
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Client application single value KV Put high latency using multiple threads (pthread)

 

[Edited Message Follows]

Hi Johann,

I ran the servers (Test46, Test48) and client (Test60) on different nodes.  They are all running in the foreground including daos_server, and daos_agent.  On each node, I press ctrl-C to stop and restart them individually.  There are only one agent running on each node.  Are there any information cached somewhere that I need to remove before restarting the servers and agents?

My understanding is that each node should have an agent running.  In my case, I have 3 agents running one each node.

Please give me examples, what should I set in each of the daos_agent.yml, daos_server.yml and daos_control.yml files on each node in terms of access_points and hostlist.  I'd like setup the servers (Test46 and Test48)  in a replication cluster.  The client is Test60.

I must have mis-configured my environment.  Please correct me.

Thanks
Ping

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


New DAOS Setup

ayushsharma.ufl@...
 

Hi 

I was trying to set up a simple single DAOS server. I built a docker image using the following commands: - 

- $ git clone https://github.com/daos-stack/daos.git
- $ cd daos
- $ sudo docker build https://github.com/daos-stack/daos.git#master -f utils/docker/Dockerfile.centos.7
- $ docker run -it -d --privileged --cap-add=ALL --name server -v /dev:/dev 76379d4a8832

This gets a container running. 
After this, I believe I am supposed to run the server with $ sudo docker exec server daos_server start -o /home/daos/daos/utils/config/examples/daos_server_local.yml 
I get an error No such file or directory /home/daos/daos/utils/config/examples/daos_server_local.yml

So I copied the cloned DAOS source tree into the running container (this feels like a hack, was not mentioned in the documentation) with: $ Sudo docker cp ./daos server:/home/daos
After this when I run : 
- $ sudo docker exec server daos_server start -o /home/daos/daos/utils/config/examples/daos_server_local.yml 


I get the following error log: - 

$ sudo docker exec server daos_server start \

>         -o /home/daos/daos/utils/config/examples/daos_server_local.yml

DAOS Server config loaded from /home/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

ERROR: server: code = 647 description = "requested 4096 hugepages; got 0"

 

ERROR: server: code = 647 resolution = "reboot the system or manually clear /dev/hugepages as appropriate"

Can anyone help me with what is it I am doing wrong, or what could be the reason for this? I am unable to find a similar error anywhere. Maybe I am doing something basic wrong or it's just that my config or hardware doesn't support this.
I am running all of this on a CentOS Virtual machine with 16GiB memory on Azure cloud. 


Any help will be deeply appreciated.
Regards

 


Huge latency observed by DAOS client

k.patwardhan@...
 

Hello DAOS community,

 

 

I have a rather strange observation when working with DAOS. The DAOS client system reports way higher latency than I anticipated and am wondering if someone knows whether this is a known problem.

 

The diagram attached depicts my test setup.

 

1.      What I see is that the client issues a single Put request (16 byte key, 4 KB value) “invoking daos_kv_put()” to the DAOS server.

2.      The DAOS server (master) responds to the Put request in about 0.28 msec.

 

3.      However, the client reports an overall latency of 1 msec. Client latency is time taken by daos_kv_put() to complete.

 

Observations:

1. Sometimes daos_kv_put() returns before dc_rw_cb() callback is called. Is that normal ?

2. Often dc_rw_cb() gets called quickly but the client does some work related to the mercurial module before returning from daos_kv_put(). Any idea what is the client doing even after dc_rw_cb() got called ?

               2.1 My observation above of server responding in 0.28 msec but the client returning from daos_kv_put() after 1ms relates to this case. I believe dc_rw_db() gets called in response to the DAOS server finishing the Put request but the client still does some work that I don’t understand why.

 

The client seems to call some mercurial functions between the time the server responded back to it and before daos_kv_put() could return. Is that expected ? Also, any idea what’s going on in-between that increases the overall latency from 0,28msec to 1msec ?

 

Thanks in advance and looking forward to root causing the problem soon.

 

Regards,

Kedar


Huge latency observed by DAOS client

KP (Kedar) Patwardhan <k.patwardhan@...>
 

Hello DAOS community,

 

 

I have a rather strange observation when working with DAOS. The DAOS client system reports way higher latency than I anticipated and am wondering if someone knows whether this is a known problem.

 

 

The diagram above depicts my test setup.

 

1.      What I see is that the client issues a single Put request (16 byte key, 4 KB value) “invoking daos_kv_put()” to the DAOS server.

2.      The DAOS server (master) responds to the Put request in about 0.28 msec.

 

3.      However, the client reports an overall latency of 1 msec. Client latency is time taken by daos_kv_put() to complete.

 

Observations:

1. Sometimes daos_kv_put() returns before dc_rw_cb() callback is called. Is that normal ?

2. Often dc_rw_cb() gets called quickly but the client does some work related to the mercurial module before returning from daos_kv_put(). Any idea what is the client doing even after dc_rw_cb() got called ?

               2.1 My observation above of server responding in 0.28 msec but the client returning from daos_kv_put() after 1ms relates to this case. I believe dc_rw_db() gets called in response to the DAOS server finishing the Put request but the client still does some work that I don’t understand why.

 

The client seems to call some mercurial functions between the time the server responded back to it and before daos_kv_put() could return. Is that expected ? Also, any idea what’s going on in-between that increases the overall latency from 0,28msec to 1msec ?

 

Thanks in advance and looking forward to root causing the problem soon.

 

Regards,

Kedar

41 - 60 of 1405