Re: CPU NUMA node bind error
Rosenzweig, Joel B
On the client side, the daos_agent examines the NUMA binding associated with the PID of the client application and automatically assigns an interface to the client that matches that NUMA affinity. If the client is bound to a NUMA node that has no compatible network interface, or isn’t bound at all, then the agent assigns an interface from the default NUMA node. To get the best performance then, you’d want to bind your client application to a NUMA node that matches one of the network interfaces available to daos_agent running on your client node.
If the client is bound to a NUMA node without a compatible interface, then performance will suffer. I wrote some details about this in the /doc/admin/performance_tuning.md file. I go into more detail there. There’s additional info I wrote about this mechanism in the “Get Attach Info” section of the /src/control/cmd/daos_agent/README.md. That said, you can specifically choose an interface for the client and override the automatic selection, by setting OFI_INTERFACE=… in the client environment if you desire to do so.
Using the pinned_numa_node setting on the daos_server is separate from settings that affect the client side. This setting only controls how the daos_io_server processes are bound. In the ideal case, a daos_server launches up to 1 daos_io_server process per NUMA node / matching network interface and using the ‘pinned_numa_node’ setting instructs the daos_io_server process to bind itself to cores matching that NUMA affinity.
From: firstname.lastname@example.org <email@example.com> On Behalf Of Niu, Yawei
Sent: Friday, October 16, 2020 11:17 AM
Subject: Re: [daos] CPU NUMA node bind error
The assert of “bdh_io_channel != NULL” is because a bio poll is called after the context is freed on error cleanup, could you open a ticket for it? Thanks!
<firstname.lastname@example.org> on behalf of Wu Huijun <huijunw91@...>
Patrick, thanks for your reply. I see. But for the server, the only change I made that triggered this error was to add "pinned_numa_node: 1" in the server config yml file...
On Fri, Oct 16, 2020 at 10:30 PM Farrell, Patrick Arthur <patrick.farrell@...> wrote: