Thank you for the sanity check regarding dfs_lookup. After a little sleuthing, the application (evidently) was modifying the effective UID/GID around the time of that lookup. And it was this *ID change that made networking fail. With those calls changed, DFS is now doing what I expected/thought/hoped 🙂
From: firstname.lastname@example.org <email@example.com> on behalf of Chaarawi, Mohamad <mohamad.chaarawi@...>
Sent: Tuesday, April 5, 2022 5:21 AM
To: firstname.lastname@example.org <email@example.com>
Subject: Re: [daos] dfs_lookup behavior for non-existent files?
Neither dfs_lookup nor dfs_stat do set the st_ino in the stat buf.
The reason being is that files are uniquely identified by the daos object ID which is 128 bits (64 hi, 64 lo).
You can retrieve that using dfs_obj2id():
now for the other error, that seems weird. The errors are coming from the network layer. At that point, are there any servers that are down or were killed (specifically the engine with rank 1)? This would explain the errors.
When I try this myself, I get ENOENT for lookup on “//.Trash” as expected.
firstname.lastname@example.org <email@example.com> on behalf of Tuffli, Chuck <chuck.tuffli@...>
I'm porting an existing application to use DFS (DAOS v2.0.2) instead of POSIX and need help understanding the error messages printed to the console.
The code is using dfs_lookup() to retrieve the struct stat of a file. Note the implementation cannot use dfs_stat() as it requires valid values for fields such as st_ino that dfs_stat() does not provide. The code in question is:
d_lstat(const char * restrict path, struct stat * restrict sb)
dfs_obj_t *obj = NULL;
rc = dfs_lookup(dfs, path, O_RDONLY, &obj, NULL, sb);
If the file path exists (e.g. "/"), this works. But if the path, doesn't exist (e.g. "//.Trash"), the call to dfs_lookup() does not return. Instead, the console endlessly prints messages like:
04/04-16:28:50.90 xxxxx DAOS[1178648/1178648/0] external ERR # [6937851.329315] mercury->msg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/na/na_ofi.c:2972
# na_ofi_msg_send(): fi_tsend() failed, rc: -13 (Permission denied)
04/04-16:28:50.90 xxxxx DAOS[1178648/1178648/0] external ERR # [6937851.329374] mercury->hg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/mercury_core.c:2727
# hg_core_forward_na(): Could not post send for input buffer (NA_ACCESS)
04/04-16:28:50.90 xxxxx DAOS[1178648/1178648/0] hg ERR src/cart/crt_hg.c:1104 crt_hg_req_send_cb(0x1d0cd40) [opc=0x4070001 (DAOS) rpcid=0x63f8133700000008 rank:tag=1:2] RPC failed; rc: DER_HG(-1020): 'Transport layer mercury error'
04/04-16:28:50.90 xxxxx DAOS[1178648/1178648/0] object ERR src/object/cli_shard.c:889 dc_rw_cb() RPC 1 failed, DER_HG(-1020): 'Transport layer mercury error'
Am I mis-using dfs_lookup() or using it incorrectly?