[External] Re: [daos] dfs_stat and infinitely loop


Shengyu SY19 Zhang
 

Hi Mohamad,

 

Yes, the code works without setgid(0) (or similar functions related to user context), client log is nothing related, it infinitely  loop in the poll just like network packet lost, this is the dbg stack:

 

#0  0x00007f1966699e63 in epoll_wait () from /lib64/libc.so.6

#1  0x00007f19658dc728 in hg_poll_wait (poll_set=0x85d990, timeout=timeout@entry=1, progressed=progressed@entry=0x7ffd6af1f49f "") at /root/daos/_build.external/mercury/src/util/mercury_poll.c:434

#2  0x00007f1965d05763 in hg_core_progress_poll (context=0x895b70, timeout=1) at /root/daos/_build.external/mercury/src/mercury_core.c:3280

#3  0x00007f1965d0a94c in HG_Core_progress (context=<optimized out>, timeout=timeout@entry=1) at /root/daos/_build.external/mercury/src/mercury_core.c:4877

#4  0x00007f1965d0242d in HG_Progress (context=context@entry=0x77f250, timeout=timeout@entry=1) at /root/daos/_build.external/mercury/src/mercury.c:2243

#5  0x00007f1966dfb28b in crt_hg_progress (hg_ctx=hg_ctx@entry=0x8909b8, timeout=timeout@entry=1000) at src/cart/crt_hg.c:1366

#6  0x00007f1966dbcf2b in crt_progress (crt_ctx=0x8909a0, timeout=timeout@entry=-1, cond_cb=cond_cb@entry=0x7f196772d5a0 <ev_progress_cb>, arg=arg@entry=0x7ffd6af1f5d0) at src/cart/crt_context.c:1300

#7  0x00007f19677328c6 in daos_event_priv_wait () at src/client/api/event.c:1205

#8  0x00007f1967736096 in dc_task_schedule (task=0x8a3be0, instant=instant@entry=true) at src/client/api/task.c:139

#9  0x00007f196773492c in daos_obj_fetch (oh=..., oh@entry=..., th=..., th@entry=..., flags=flags@entry=0, dkey=dkey@entry=0x7ffd6af1f6d0, nr=nr@entry=1, iods=iods@entry=0x7ffd6af1f6f0, sgls=sgls@entry=0x7ffd6af1f6b0, maps=maps@entry=0x0, ev=ev@entry=0x0)

    at src/client/api/object.c:170

#10 0x00007f19674f810a in fetch_entry (oh=oh@entry=..., th=..., th@entry=..., name=0x941808 "/", fetch_sym=fetch_sym@entry=true, exists=exists@entry=0x7ffd6af1f84f, entry=0x7ffd6af1f860) at src/client/dfs/dfs.c:329

#11 0x00007f19674fb4cf in entry_stat (dfs=dfs@entry=0x941770, th=th@entry=..., oh=..., name=name@entry=0x941808 "/", stbuf=stbuf@entry=0x7ffd6af1f9c0) at src/client/dfs/dfs.c:490

#12 0x00007f19675072e7 in dfs_stat (dfs=0x941770, parent=0x9417d8, name=0x941808 "/", stbuf=0x7ffd6af1f9c0) at src/client/dfs/dfs.c:2876

#13 0x00000000004012c3 in main ()

 

Regards,

Shengyu.

 

From: <daos@daos.groups.io> on behalf of "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, February 19, 2020 at 11:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [External] Re: [daos] dfs_stat and infinitely loop

 

Hi Shengyu,

 

If you don’t setgid(0), it works? Im not sure why that would cause the operation not to return.

Could you please attach gdb and return a trace of where it hangs? Do you see anything suspicious in the DAOS client log?

 

Thanks,

Mohamad

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, February 18, 2020 at 9:35 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] dfs_stat and infinitely loop

 

Hello,

 

Recently I got this issue, when I issue dfs_stat in my code, it never return, and now I found basic reason, however I haven’t got solution, this is sample code:

rc = dfs_mount(dfs_poh, coh, O_RDWR, &dfs1);

        if (rc != -DER_SUCCESS) {

                printf("Failed to mount to container (%d)\n", rc);

                D_GOTO(out_dfs, 0);

        }

 

        setgid(0);

       

        struct stat stbuf = {0};

       

rc = dfs_stat(dfs1, NULL, NULL, (struct stat *) &stbuf);

        if(rc)

                printf("stat '' failed, rc: %d\n", rc);

        else

                printf("stat \'\' succesffuly, rc: %d\n", rc);

 

There is setgid(0), even there is no change to the current gid, the problem will always happen. I’m working on DAOS samba plugin, there are lots of similar user context switch operations.

 

Regards,

Shengyu.


Patrick Farrell <paf@...>
 

Are you using OPA?  I believe there are some issues with network contexts and different users in OPA...?
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, February 19, 2020 7:41:09 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] dfs_stat and infinitely loop
 

Hi Mohamad,

 

Yes, the code works without setgid(0) (or similar functions related to user context), client log is nothing related, it infinitely  loop in the poll just like network packet lost, this is the dbg stack:

 

#0  0x00007f1966699e63 in epoll_wait () from /lib64/libc.so.6

#1  0x00007f19658dc728 in hg_poll_wait (poll_set=0x85d990, timeout=timeout@entry=1, progressed=progressed@entry=0x7ffd6af1f49f "") at /root/daos/_build.external/mercury/src/util/mercury_poll.c:434

#2  0x00007f1965d05763 in hg_core_progress_poll (context=0x895b70, timeout=1) at /root/daos/_build.external/mercury/src/mercury_core.c:3280

#3  0x00007f1965d0a94c in HG_Core_progress (context=<optimized out>, timeout=timeout@entry=1) at /root/daos/_build.external/mercury/src/mercury_core.c:4877

#4  0x00007f1965d0242d in HG_Progress (context=context@entry=0x77f250, timeout=timeout@entry=1) at /root/daos/_build.external/mercury/src/mercury.c:2243

#5  0x00007f1966dfb28b in crt_hg_progress (hg_ctx=hg_ctx@entry=0x8909b8, timeout=timeout@entry=1000) at src/cart/crt_hg.c:1366

#6  0x00007f1966dbcf2b in crt_progress (crt_ctx=0x8909a0, timeout=timeout@entry=-1, cond_cb=cond_cb@entry=0x7f196772d5a0 <ev_progress_cb>, arg=arg@entry=0x7ffd6af1f5d0) at src/cart/crt_context.c:1300

#7  0x00007f19677328c6 in daos_event_priv_wait () at src/client/api/event.c:1205

#8  0x00007f1967736096 in dc_task_schedule (task=0x8a3be0, instant=instant@entry=true) at src/client/api/task.c:139

#9  0x00007f196773492c in daos_obj_fetch (oh=..., oh@entry=..., th=..., th@entry=..., flags=flags@entry=0, dkey=dkey@entry=0x7ffd6af1f6d0, nr=nr@entry=1, iods=iods@entry=0x7ffd6af1f6f0, sgls=sgls@entry=0x7ffd6af1f6b0, maps=maps@entry=0x0, ev=ev@entry=0x0)

    at src/client/api/object.c:170

#10 0x00007f19674f810a in fetch_entry (oh=oh@entry=..., th=..., th@entry=..., name=0x941808 "/", fetch_sym=fetch_sym@entry=true, exists=exists@entry=0x7ffd6af1f84f, entry=0x7ffd6af1f860) at src/client/dfs/dfs.c:329

#11 0x00007f19674fb4cf in entry_stat (dfs=dfs@entry=0x941770, th=th@entry=..., oh=..., name=name@entry=0x941808 "/", stbuf=stbuf@entry=0x7ffd6af1f9c0) at src/client/dfs/dfs.c:490

#12 0x00007f19675072e7 in dfs_stat (dfs=0x941770, parent=0x9417d8, name=0x941808 "/", stbuf=0x7ffd6af1f9c0) at src/client/dfs/dfs.c:2876

#13 0x00000000004012c3 in main ()

 

Regards,

Shengyu.

 

From: <daos@daos.groups.io> on behalf of "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, February 19, 2020 at 11:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [External] Re: [daos] dfs_stat and infinitely loop

 

Hi Shengyu,

 

If you don’t setgid(0), it works? Im not sure why that would cause the operation not to return.

Could you please attach gdb and return a trace of where it hangs? Do you see anything suspicious in the DAOS client log?

 

Thanks,

Mohamad

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, February 18, 2020 at 9:35 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] dfs_stat and infinitely loop

 

Hello,

 

Recently I got this issue, when I issue dfs_stat in my code, it never return, and now I found basic reason, however I haven’t got solution, this is sample code:

rc = dfs_mount(dfs_poh, coh, O_RDWR, &dfs1);

        if (rc != -DER_SUCCESS) {

                printf("Failed to mount to container (%d)\n", rc);

                D_GOTO(out_dfs, 0);

        }

 

        setgid(0);

       

        struct stat stbuf = {0};

       

rc = dfs_stat(dfs1, NULL, NULL, (struct stat *) &stbuf);

        if(rc)

                printf("stat '' failed, rc: %d\n", rc);

        else

                printf("stat \'\' succesffuly, rc: %d\n", rc);

 

There is setgid(0), even there is no change to the current gid, the problem will always happen. I’m working on DAOS samba plugin, there are lots of similar user context switch operations.

 

Regards,

Shengyu.