Anyone seen this DAX oops before?


Kevan Rehm
 

I am not sure, unfortunately, we didn’t notice when it first happened.   We will keep an eye out if it happens again.

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 6, 2020 at 1:11 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Anyone seen this DAX oops before?

 

It has been a while since I last saw a kernel backtrace 😊

Never seen this before. I assume that this happens during pool deletion or disconnect?

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 6 May 2020 at 00:02
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Anyone seen this DAX oops before?

 

We had an oops occur today, backtrace is below.   When I started looking, I see there are 6 of these over the last month, the backtrace is always the same.   It seems to be happening in DAX.  Does this look familiar?

 

Thanks, Kevan

 

# cat backtrace

WARNING: CPU: 18 PID: 43720 at fs/dax.c:419 dax_disassociate_entry+0xdb/0x130

Modules linked in: ext4 mbcache jbd2 vfio_pci virtio_pci virtio_ring virtio nfsv3 nfs_acl socwatch2_11(OE) vtsspp(OE) sep5(OE) socperf3(OE) pax(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_en(OE) mlx4_core(OE) sunrpc iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm vfat fat irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev mei_me mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem device_dax acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c nd_pmem nd_btt ast i2c_algo_bit

drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul mlx5_core(OE) crct10dif_common crc32c_intel drm libahci mlxfw(OE) ptp pps_core libata vfio_mdev(OE) vfio_iommu_type1 vfio nvme mdev(OE) devlink nvme_core mlx_compat(OE) drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod

CPU: 18 PID: 43720 Comm: daos_sys_0 Tainted: G           OE  ------------ T 3.10.0-1127.el7.x86_64 #1

Hardware name: Cray Inc. SYS-2029UZ-TN20R25M/X11DPU-Z+, BIOS 3.2 10/22/2019

Call Trace:

[<ffffffffb5b7ff85>] dump_stack+0x19/0x1b

[<ffffffffb549bd18>] __warn+0xd8/0x100

[<ffffffffb549be5d>] warn_slowpath_null+0x1d/0x20

[<ffffffffb56a6d0b>] dax_disassociate_entry+0xdb/0x130

[<ffffffffb56a7868>] __dax_invalidate_mapping_entry+0x68/0x120

[<ffffffffb56a9f47>] dax_delete_mapping_entry+0x17/0x50

[<ffffffffb55ce2fc>] truncate_exceptional_entry.part.13+0x1c/0x40

[<ffffffffb55ceaa2>] truncate_inode_pages_range+0x192/0x750

[<ffffffffb55cf0cf>] truncate_inode_pages_final+0x4f/0x60

[<ffffffffc0eeea0f>] ext4_evict_inode+0x10f/0x470 [ext4]

[<ffffffffb566b674>] evict+0xb4/0x180

[<ffffffffb566ba9c>] iput+0xfc/0x190

[<ffffffffb5666438>] __dentry_kill+0x158/0x1d0

[<ffffffffb5666ad5>] dput+0xb5/0x1a0

[<ffffffffb564f4dd>] __fput+0x18d/0x230

[<ffffffffb564f66e>] ____fput+0xe/0x10

[<ffffffffb54c31cb>] task_work_run+0xbb/0xe0

[<ffffffffb542cc65>] do_notify_resume+0xa5/0xc0

[<ffffffffb5b9322f>] int_signal+0x12/0x17

[root@delphi-002 oops-2020-05-05-11:59:34-44218-0]# 

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Johann
 

It has been a while since I last saw a kernel backtrace 😊

Never seen this before. I assume that this happens during pool deletion or disconnect?

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 6 May 2020 at 00:02
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Anyone seen this DAX oops before?

 

We had an oops occur today, backtrace is below.   When I started looking, I see there are 6 of these over the last month, the backtrace is always the same.   It seems to be happening in DAX.  Does this look familiar?

 

Thanks, Kevan

 

# cat backtrace

WARNING: CPU: 18 PID: 43720 at fs/dax.c:419 dax_disassociate_entry+0xdb/0x130

Modules linked in: ext4 mbcache jbd2 vfio_pci virtio_pci virtio_ring virtio nfsv3 nfs_acl socwatch2_11(OE) vtsspp(OE) sep5(OE) socperf3(OE) pax(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_en(OE) mlx4_core(OE) sunrpc iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm vfat fat irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev mei_me mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem device_dax acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c nd_pmem nd_btt ast i2c_algo_bit

drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul mlx5_core(OE) crct10dif_common crc32c_intel drm libahci mlxfw(OE) ptp pps_core libata vfio_mdev(OE) vfio_iommu_type1 vfio nvme mdev(OE) devlink nvme_core mlx_compat(OE) drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod

CPU: 18 PID: 43720 Comm: daos_sys_0 Tainted: G           OE  ------------ T 3.10.0-1127.el7.x86_64 #1

Hardware name: Cray Inc. SYS-2029UZ-TN20R25M/X11DPU-Z+, BIOS 3.2 10/22/2019

Call Trace:

[<ffffffffb5b7ff85>] dump_stack+0x19/0x1b

[<ffffffffb549bd18>] __warn+0xd8/0x100

[<ffffffffb549be5d>] warn_slowpath_null+0x1d/0x20

[<ffffffffb56a6d0b>] dax_disassociate_entry+0xdb/0x130

[<ffffffffb56a7868>] __dax_invalidate_mapping_entry+0x68/0x120

[<ffffffffb56a9f47>] dax_delete_mapping_entry+0x17/0x50

[<ffffffffb55ce2fc>] truncate_exceptional_entry.part.13+0x1c/0x40

[<ffffffffb55ceaa2>] truncate_inode_pages_range+0x192/0x750

[<ffffffffb55cf0cf>] truncate_inode_pages_final+0x4f/0x60

[<ffffffffc0eeea0f>] ext4_evict_inode+0x10f/0x470 [ext4]

[<ffffffffb566b674>] evict+0xb4/0x180

[<ffffffffb566ba9c>] iput+0xfc/0x190

[<ffffffffb5666438>] __dentry_kill+0x158/0x1d0

[<ffffffffb5666ad5>] dput+0xb5/0x1a0

[<ffffffffb564f4dd>] __fput+0x18d/0x230

[<ffffffffb564f66e>] ____fput+0xe/0x10

[<ffffffffb54c31cb>] task_work_run+0xbb/0xe0

[<ffffffffb542cc65>] do_notify_resume+0xa5/0xc0

[<ffffffffb5b9322f>] int_signal+0x12/0x17

[root@delphi-002 oops-2020-05-05-11:59:34-44218-0]# 

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Kevan Rehm
 

We had an oops occur today, backtrace is below.   When I started looking, I see there are 6 of these over the last month, the backtrace is always the same.   It seems to be happening in DAX.  Does this look familiar?

 

Thanks, Kevan

 

# cat backtrace

WARNING: CPU: 18 PID: 43720 at fs/dax.c:419 dax_disassociate_entry+0xdb/0x130

Modules linked in: ext4 mbcache jbd2 vfio_pci virtio_pci virtio_ring virtio nfsv3 nfs_acl socwatch2_11(OE) vtsspp(OE) sep5(OE) socperf3(OE) pax(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_en(OE) mlx4_core(OE) sunrpc iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm vfat fat irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev mei_me mei lpc_ich i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem device_dax acpi_power_meter acpi_pad knem(OE) ip_tables xfs libcrc32c nd_pmem nd_btt ast i2c_algo_bit

drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul mlx5_core(OE) crct10dif_common crc32c_intel drm libahci mlxfw(OE) ptp pps_core libata vfio_mdev(OE) vfio_iommu_type1 vfio nvme mdev(OE) devlink nvme_core mlx_compat(OE) drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod

CPU: 18 PID: 43720 Comm: daos_sys_0 Tainted: G           OE  ------------ T 3.10.0-1127.el7.x86_64 #1

Hardware name: Cray Inc. SYS-2029UZ-TN20R25M/X11DPU-Z+, BIOS 3.2 10/22/2019

Call Trace:

[<ffffffffb5b7ff85>] dump_stack+0x19/0x1b

[<ffffffffb549bd18>] __warn+0xd8/0x100

[<ffffffffb549be5d>] warn_slowpath_null+0x1d/0x20

[<ffffffffb56a6d0b>] dax_disassociate_entry+0xdb/0x130

[<ffffffffb56a7868>] __dax_invalidate_mapping_entry+0x68/0x120

[<ffffffffb56a9f47>] dax_delete_mapping_entry+0x17/0x50

[<ffffffffb55ce2fc>] truncate_exceptional_entry.part.13+0x1c/0x40

[<ffffffffb55ceaa2>] truncate_inode_pages_range+0x192/0x750

[<ffffffffb55cf0cf>] truncate_inode_pages_final+0x4f/0x60

[<ffffffffc0eeea0f>] ext4_evict_inode+0x10f/0x470 [ext4]

[<ffffffffb566b674>] evict+0xb4/0x180

[<ffffffffb566ba9c>] iput+0xfc/0x190

[<ffffffffb5666438>] __dentry_kill+0x158/0x1d0

[<ffffffffb5666ad5>] dput+0xb5/0x1a0

[<ffffffffb564f4dd>] __fput+0x18d/0x230

[<ffffffffb564f66e>] ____fput+0xe/0x10

[<ffffffffb54c31cb>] task_work_run+0xbb/0xe0

[<ffffffffb542cc65>] do_notify_resume+0xa5/0xc0

[<ffffffffb5b9322f>] int_signal+0x12/0x17

[root@delphi-002 oops-2020-05-05-11:59:34-44218-0]#