Date
1 - 3 of 3
CPU and Log Activity after Server Startup
Steffen Christgau
Hi everybody,
I think, I managed to successfully install DAOS on a small test-bed consisting of two nodes. Both machines run an up-to-date installation of CentOS 7. In addition, libfabric 1.8.0, isal 2.27, pmix 3.1.3, Open MPI 4.0.1, hwloc 1.11.2 and protobuf 3.9.0 with protobuf-c 1.3.2 have been manually compiled and installed on these two machines. DAOS v0.6 has been compiled from the sources using the prebuilt packages from above. DAOS runs under a dedicated user and group id (daos:daos), which have access to the pmem and hugetable mount points. The machines which run DAOS have NVDIMMs installed but no NVMe storage yet (still in delivery :-(). However, the server is configured to use a file as NVMe emulation. Using orterun as documented in the admin guide the server starts up on both machines: ---------------------------------------------------------------- $ orterun -n 2 --map-by node --report-uri /tmp/daos_urifile --enable-recovery daos_server -t 1 -i -o $DAOS_ROOT/etc/daos_server.yml 2019/08/14 15:44:10 config.go:109: debug: DAOS config read from /opt/local/daos/0.6/etc/daos_server.yml 2019/08/14 15:44:10 config.go:150: debug: Active config saved to /opt/local/daos/0.6/etc/.daos_server.active.yml (read-only) 2019/08/14 15:44:10 config.go:361: debug: Switching control log level to DEBUG 2019/08/14 15:44:10 config.go:376: debug: daos_server logging to file /var/log/daos/control.log Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1077140161 --base-virtaddr=0x200000000000 --proc-type=auto ] 2019/08/14 15:44:10 config.go:109: debug: DAOS config read from /opt/local/daos/0.6/etc/daos_server.yml 2019/08/14 15:44:10 config.go:150: debug: Active config saved to /opt/local/daos/0.6/etc/.daos_server.active.yml (read-only) 2019/08/14 15:44:10 config.go:361: debug: Switching control log level to DEBUG 2019/08/14 15:44:10 config.go:376: debug: daos_server logging to file /var/log/daos/control.log Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk990821539 --base-virtaddr=0x200000000000 --proc-type=auto ] EAL: Detected 96 lcore(s) EAL: Auto-detected process type: PRIMARY EAL: Detected 96 lcore(s) EAL: Auto-detected process type: PRIMARY EAL: Multi-process socket /opt/local/daos/home/.spdk990821539_unix EAL: Probing VFIO support... EAL: Multi-process socket /opt/local/daos/home/.spdk1077140161_unix EAL: Probing VFIO support... no NVMe controllers found no NVMe controllers found Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk990821539 --base-virtaddr=0x200000000000 --proc-type=auto ] EAL: Detected 96 lcore(s) EAL: Auto-detected process type: SECONDARY EAL: Multi-process socket /opt/local/daos/home/.spdk990821539_unix_23769_bc38cb16a9404 EAL: Probing VFIO support... Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1077140161 --base-virtaddr=0x200000000000 --proc-type=auto ] EAL: Detected 96 lcore(s) EAL: Auto-detected process type: SECONDARY EAL: Multi-process socket /opt/local/daos/home/.spdk1077140161_unix_38799_10f8cc2f18d7bc EAL: Probing VFIO support... DAOS I/O server (v0.6.0) process 38799 started on rank 0 (out of 2) with 1 target xstream set(s), 2 helper XS per target, firstcore 0. DAOS I/O server (v0.6.0) process 23769 started on rank 1 (out of 2) with 1 target xstream set(s), 2 helper XS per target, firstcore 0. ---------------------------------------------------------------- What makes me wonder is that four CPU cores of every node are 100% busy which is caused by threads of daos_io_server process. Attaching gdb to the process/one of these threads show that the activity appears to come from the Argobots framework or at least it hangs in ABTI_xstream_schedule_thread (among others) when I interrupt. Is this an intended behavior or do I have take any actions to remove this behavior? In addition, but maybe unrelated to that issue, the following lines are added log server file approx. every 2 seconds: ---------------------------------------------------------------- 08/14-15:47:29.15 apass2 DAOS[38799/38804] swim INFO src/swim/swim.c:635 swim_progress() 0: dping 0 => 1 08/14-15:47:29.15 apass2 DAOS[38799/38804] swim INFO src/swim/swim.c:73 swim_dump_updates() 0 => 1: {1 A 0} {0 A 0} 08/14-15:47:29.15 apass2 DAOS[38799/38804] swim INFO src/swim/swim.c:73 swim_dump_updates() 0 <= 1: {1 A 0} 08/14-15:47:29.22 apass2 DAOS[38799/38804] swim INFO src/swim/swim.c:73 swim_dump_updates() 0 <= 1: {0 A 0} {1 A 0} 08/14-15:47:29.22 apass2 DAOS[38799/38804] swim INFO src/swim/swim.c:842 swim_parse_message() 0: dresp 0 => 1 08/14-15:47:29.22 apass2 DAOS[38799/38804] swim INFO src/swim/swim.c:73 swim_dump_updates() 0 => 1: {0 A 0} ---------------------------------------------------------------- (looks like a heartbeat ping) Since those lines are only INFOs is it safe to ignore them? Thanks in advance for your responses. Regards, Steffen
|
|
Li, Wei G
Hi Steffen,
toggle quoted messageShow quoted text
The CPU activity is expected. daos_io_server uses Argobots to execute multiple tasks on each pthread. Our scheduler does not yet have a complete knowledge of all expected events in order to decide when it is safe to block each pthread without risking an undesirable latency for the forthcoming event. The pthreads therefore keep polling all event sources in loops. The swim INFO messages are harmless, though I would agree that they should be less verbose. Regards, Li Wei
-----Original Message-----
|
|
Steffen Christgau
Thank you for the clarification, Li.
toggle quoted messageShow quoted text
Regards, Steffen
On 15/08/2019 05.33, Li, Wei G wrote:
Hi Steffen,
|
|