Topics

Error on simple test on POSIX container


Yunjae Lee
 

Hi,

I created a POSIX container and mounted at /mnt/dfuse on the client node,
and ran the following command:
```
# echo "foo" > /mnt/dfuse/bar
# cat /mnt/dfuse/bar
```

But it gives me the following error repeated infinitely.
object ERR src/object/cli_shard.c:631 dc_rw_cb() rpc 0x7ffa3801d6e0 opc 1 to rank 0 tag 7 failed: DER_HG(-1020): 'Transport layer mercury error'
OS: Ubuntu 20.04
Network: Infiniband with MOFED 5.0-2
DAOS version: c20c47 (commit at 2020-11-28)


Yunjae Lee
 

It seems to be related to the size of the file.
When creating a file smaller than 4k, reading the file using cat fails.


Lombardi, Johann
 

Hi there,

 

I assume that you are using “ofi+verbs;ofi_rxm” as the provider, right?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Yunjae Lee <lyj7694@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 1 December 2020 at 06:45
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error on simple test on POSIX container

 

It seems to be related to the size of the file.
When creating a file smaller than 4k, reading the file using cat fails.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Yunjae Lee
 

Hi Johann,

Yes, I'm using "ofi+verbs;ofi_rxm".

I guess the problem is independent to the DFS, since issuing small DFS IO showed no errors.


Thanks,
Yunjae


Lombardi, Johann
 

Hi there,

 

The fact that you can only reproduce this mercury/transport error with dfuse and not DFS is interesting.

I have just tried on CentOS and couldn’t reproduce this on latest master. I might have to try with Ubuntu …

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Yunjae Lee <lyj7694@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 8 December 2020 at 15:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error on simple test on POSIX container

 

Hi Johann,

Yes, I'm using "ofi+verbs;ofi_rxm".

I guess the problem is independent to the DFS, since issuing small DFS IO showed no errors.


Thanks,
Yunjae

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Yunjae Lee
 

Hi Johann,

I've seen the problem also in v1.1.2 on Ubuntu 20.04.
I reinstalled CentOS 7.7 on the server machine, and as your experiment showed, the problem has gone now.
I guess there is a compatibility issue with Ubuntu kernel or FUSE version?

Thanks,
Yunjae


Lombardi, Johann
 

Hi Yunjae,

 

I have just tried with Ubuntu 20.04 and couldn’t reproduce. That being said, I am using the sockets provider and not ofi+verbs;ofi_rxm like in your case.

Could you please confirm that this works on your side if you switch to ofi+sockets? If so, then this issue is a combination of FUSE + IB.

Thanks.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Yunjae Lee <lyj7694@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 14 December 2020 at 10:36
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error on simple test on POSIX container

 

Hi Johann,

I've seen the problem also in v1.1.2 on Ubuntu 20.04.
I reinstalled CentOS 7.7 on the server machine, and as your experiment showed, the problem has gone now.
I guess there is a compatibility issue with Ubuntu kernel or FUSE version?

Thanks,
Yunjae

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Yunjae Lee
 

Hi Johann,

Currently I'm testing some features of DAOS on the servers with CentOS 7.7 installed.
I can test the socket provider on Ubuntu 20.04 after the test is done.
I'll let you know if it works or not.

Thanks,
Yunjae


Lombardi, Johann
 

Hi Yunjae,

 

Any progress? Thanks in advance.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Yunjae Lee <lyj7694@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 21 December 2020 at 05:28
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error on simple test on POSIX container

 

Hi Johann,

Currently I'm testing some features of DAOS on the servers with CentOS 7.7 installed.
I can test the socket provider on Ubuntu 20.04 after the test is done.
I'll let you know if it works or not.

Thanks,
Yunjae

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.