In the Linux kernel, the following vulnerability has been resolved: pinctrl: fix deadlock in create_pinctrl() when handling -EPROBE_DEFER In create_pinctrl(), pinctrl_maps_mutex is acquired before calling add_setting(). If add_setting() returns -EPROBE_DEFER, create_pinctrl() calls pinctrl_free(). However, pinctrl_free() attempts to acquire pinctrl_maps_mutex, which is already held by create_pinctrl(), leading to a potential deadlock. This patch resolves the issue by releasing pinctrl_maps_mutex before calling pinctrl_free(), preventing the deadlock. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc.
In the Linux kernel, the following vulnerability has been resolved: net/mlx5: Fix missing lock on sync reset reload On sync reset reload work, when remote host updates devlink on reload actions performed on that host, it misses taking devlink lock before calling devlink_remote_reload_actions_performed() which results in triggering lock assert like the following: WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50 … CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S W 6.10.0-rc2+ #116 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core] RIP: 0010:devl_assert_locked+0x3e/0x50 … Call Trace: <TASK> ? __warn+0xa4/0x210 ? devl_assert_locked+0x3e/0x50 ? report_bug+0x160/0x280 ? handle_bug+0x3f/0x80 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? devl_assert_locked+0x3e/0x50 devlink_notify+0x88/0x2b0 ? mlx5_attach_device+0x20c/0x230 [mlx5_core] ? __pfx_devlink_notify+0x10/0x10 ? process_one_work+0x4b6/0xbb0 process_one_work+0x4b6/0xbb0 […]
In the Linux kernel, the following vulnerability has been resolved: usb: dwc3: core: remove lock of otg mode during gadget suspend/resume to avoid deadlock When config CONFIG_USB_DWC3_DUAL_ROLE is selected, and trigger system to enter suspend status with below command: echo mem > /sys/power/state There will be a deadlock issue occurring. Detailed invoking path as below: dwc3_suspend_common() spin_lock_irqsave(&dwc->lock, flags); <-- 1st dwc3_gadget_suspend(dwc); dwc3_gadget_soft_disconnect(dwc); spin_lock_irqsave(&dwc->lock, flags); <-- 2nd This issue is exposed by commit c7ebd8149ee5 ("usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend") that removes the code of checking whether dwc->gadget_driver is NULL or not. It causes the following code is executed and deadlock occurs when trying to get the spinlock. In fact, the root cause is the commit 5265397f9442("usb: dwc3: Remove DWC3 locking during gadget suspend/resume") that forgot to remove the lock of otg mode. So, remove the redundant lock of otg mode during gadget suspend/resume.
In the Linux kernel, the following vulnerability has been resolved: exfat: fix potential deadlock on __exfat_get_dentry_set When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array is allocated in __exfat_get_entry_set. The problem is that the bh-array is allocated with GFP_KERNEL. It does not make sense. In the following cases, a deadlock for sbi->s_lock between the two processes may occur. CPU0 CPU1 ---- ---- kswapd balance_pgdat lock(fs_reclaim) exfat_iterate lock(&sbi->s_lock) exfat_readdir exfat_get_uniname_from_ext_entry exfat_get_dentry_set __exfat_get_dentry_set kmalloc_array ... lock(fs_reclaim) ... evict exfat_evict_inode lock(&sbi->s_lock) To fix this, let's allocate bh-array with GFP_NOFS.
In MariaDB before 10.9.2, compress_write in extra/mariabackup/ds_compress.cc does not release data_mutex upon a stream write failure, which allows local users to trigger a deadlock.
In the Linux kernel, the following vulnerability has been resolved: riscv: kexec: Avoid deadlock in kexec crash path If the kexec crash code is called in the interrupt context, the machine_kexec_mask_interrupts() function will trigger a deadlock while trying to acquire the irqdesc spinlock and then deactivate irqchip in irq_set_irqchip_state() function. Unlike arm64, riscv only requires irq_eoi handler to complete EOI and keeping irq_set_irqchip_state() will only leave this possible deadlock without any use. So we simply remove it.
In the Linux kernel, the following vulnerability has been resolved: cachefiles: add missing lock protection when polling Add missing lock protection in poll routine when iterating xarray, otherwise: Even with RCU read lock held, only the slot of the radix tree is ensured to be pinned there, while the data structure (e.g. struct cachefiles_req) stored in the slot has no such guarantee. The poll routine will iterate the radix tree and dereference cachefiles_req accordingly. Thus RCU read lock is not adequate in this case and spinlock is needed here.
In the Linux kernel, the following vulnerability has been resolved: i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr When del_timer_sync() is called in an interrupt context it throws a warning because of potential deadlock. The timer is used only to exit from wait_for_completion() after a timeout so replacing the call with wait_for_completion_timeout() allows to remove the problematic timer and its related functions altogether.
In the Linux kernel, the following vulnerability has been resolved: io_uring: fix possible deadlock in io_register_iowq_max_workers() The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock.
In the Linux kernel, the following vulnerability has been resolved: batman-adv: bypass empty buckets in batadv_purge_orig_ref() Many syzbot reports are pointing to soft lockups in batadv_purge_orig_ref() [1] Root cause is unknown, but we can avoid spending too much time there and perhaps get more interesting reports. [1] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:6:621] Modules linked in: irq event stamp: 6182794 hardirqs last enabled at (6182793): [<ffff8000801dae10>] __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 hardirqs last disabled at (6182794): [<ffff80008ad66a78>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (6182794): [<ffff80008ad66a78>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (6182792): [<ffff80008aab71c4>] spin_unlock_bh include/linux/spinlock.h:396 [inline] softirqs last enabled at (6182792): [<ffff80008aab71c4>] batadv_purge_orig_ref+0x114c/0x1228 net/batman-adv/originator.c:1287 softirqs last disabled at (6182790): [<ffff80008aab61dc>] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (6182790): [<ffff80008aab61dc>] batadv_purge_orig_ref+0x164/0x1228 net/batman-adv/originator.c:1271 CPU: 0 PID: 621 Comm: kworker/u4:6 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Workqueue: bat_events batadv_purge_orig pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline] pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388 lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 sp : ffff800099007970 x29: ffff800099007980 x28: 1fffe00018fce1bd x27: dfff800000000000 x26: ffff0000d2620008 x25: ffff0000c7e70de8 x24: 0000000000000001 x23: 1fffe00018e57781 x22: dfff800000000000 x21: ffff80008aab71c4 x20: ffff0001b40136c0 x19: ffff0000c72bbc08 x18: 1fffe0001a817bb0 x17: ffff800125414000 x16: ffff80008032116c x15: 0000000000000001 x14: 1fffe0001ee9d610 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000000000 x10: 0000000000ff0100 x9 : 0000000000000000 x8 : 00000000005e5789 x7 : ffff80008aab61dc x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000006 x1 : 0000000000000080 x0 : ffff800125414000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline] __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_purge_orig_ref+0x114c/0x1228 net/batman-adv/originator.c:1287 batadv_purge_orig+0x20/0x70 net/batman-adv/originator.c:1300 process_one_work+0x694/0x1204 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x938/0xef4 kernel/workqueue.c:2787 kthread+0x288/0x310 kernel/kthread.c:388 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:51 lr : default_idle_call+0xf8/0x128 kernel/sched/idle.c:103 sp : ffff800093a17d30 x29: ffff800093a17d30 x28: dfff800000000000 x27: 1ffff00012742fb4 x26: ffff80008ec9d000 x25: 0000000000000000 x24: 0000000000000002 x23: 1ffff00011d93a74 x22: ffff80008ec9d3a0 x21: 0000000000000000 x20: ffff0000c19dbc00 x19: ffff8000802d0fd8 x18: 1fffe00036804396 x17: ffff80008ec9d000 x16: ffff8000802d089c x15: 0000000000000001 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: i2c: lpi2c: Avoid calling clk_get_rate during transfer Instead of repeatedly calling clk_get_rate for each transfer, lock the clock rate and cache the value. A deadlock has been observed while adding tlv320aic32x4 audio codec to the system. When this clock provider adds its clock, the clk mutex is locked already, it needs to access i2c, which in return needs the mutex for clk_get_rate as well.
In the Linux kernel, the following vulnerability has been resolved: VMCI: Use threaded irqs instead of tasklets The vmci_dispatch_dgs() tasklet function calls vmci_read_data() which uses wait_event() resulting in invalid sleep in an atomic context (and therefore potentially in a deadlock). Use threaded irqs to fix this issue and completely remove usage of tasklets. [ 20.264639] BUG: sleeping function called from invalid context at drivers/misc/vmw_vmci/vmci_guest.c:145 [ 20.264643] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 762, name: vmtoolsd [ 20.264645] preempt_count: 101, expected: 0 [ 20.264646] RCU nest depth: 0, expected: 0 [ 20.264647] 1 lock held by vmtoolsd/762: [ 20.264648] #0: ffff0000874ae440 (sk_lock-AF_VSOCK){+.+.}-{0:0}, at: vsock_connect+0x60/0x330 [vsock] [ 20.264658] Preemption disabled at: [ 20.264659] [<ffff80000151d7d8>] vmci_send_datagram+0x44/0xa0 [vmw_vmci] [ 20.264665] CPU: 0 PID: 762 Comm: vmtoolsd Not tainted 5.19.0-0.rc8.20220727git39c3c396f813.60.fc37.aarch64 #1 [ 20.264667] Hardware name: VMware, Inc. VBSA/VBSA, BIOS VEFI 12/31/2020 [ 20.264668] Call trace: [ 20.264669] dump_backtrace+0xc4/0x130 [ 20.264672] show_stack+0x24/0x80 [ 20.264673] dump_stack_lvl+0x88/0xb4 [ 20.264676] dump_stack+0x18/0x34 [ 20.264677] __might_resched+0x1a0/0x280 [ 20.264679] __might_sleep+0x58/0x90 [ 20.264681] vmci_read_data+0x74/0x120 [vmw_vmci] [ 20.264683] vmci_dispatch_dgs+0x64/0x204 [vmw_vmci] [ 20.264686] tasklet_action_common.constprop.0+0x13c/0x150 [ 20.264688] tasklet_action+0x40/0x50 [ 20.264689] __do_softirq+0x23c/0x6b4 [ 20.264690] __irq_exit_rcu+0x104/0x214 [ 20.264691] irq_exit_rcu+0x1c/0x50 [ 20.264693] el1_interrupt+0x38/0x6c [ 20.264695] el1h_64_irq_handler+0x18/0x24 [ 20.264696] el1h_64_irq+0x68/0x6c [ 20.264697] preempt_count_sub+0xa4/0xe0 [ 20.264698] _raw_spin_unlock_irqrestore+0x64/0xb0 [ 20.264701] vmci_send_datagram+0x7c/0xa0 [vmw_vmci] [ 20.264703] vmci_datagram_dispatch+0x84/0x100 [vmw_vmci] [ 20.264706] vmci_datagram_send+0x2c/0x40 [vmw_vmci] [ 20.264709] vmci_transport_send_control_pkt+0xb8/0x120 [vmw_vsock_vmci_transport] [ 20.264711] vmci_transport_connect+0x40/0x7c [vmw_vsock_vmci_transport] [ 20.264713] vsock_connect+0x278/0x330 [vsock] [ 20.264715] __sys_connect_file+0x8c/0xc0 [ 20.264718] __sys_connect+0x84/0xb4 [ 20.264720] __arm64_sys_connect+0x2c/0x3c [ 20.264721] invoke_syscall+0x78/0x100 [ 20.264723] el0_svc_common.constprop.0+0x68/0x124 [ 20.264724] do_el0_svc+0x38/0x4c [ 20.264725] el0_svc+0x60/0x180 [ 20.264726] el0t_64_sync_handler+0x11c/0x150 [ 20.264728] el0t_64_sync+0x190/0x194
In the Linux kernel, the following vulnerability has been resolved: mm/kmemleak: avoid scanning potential huge holes When using devm_request_free_mem_region() and devm_memremap_pages() to add ZONE_DEVICE memory, if requested free mem region's end pfn were huge(e.g., 0x400000000), the node_end_pfn() will be also huge (see move_pfn_range_to_zone()). Thus it creates a huge hole between node_start_pfn() and node_end_pfn(). We found on some AMD APUs, amdkfd requested such a free mem region and created a huge hole. In such a case, following code snippet was just doing busy test_bit() looping on the huge hole. for (pfn = start_pfn; pfn < end_pfn; pfn++) { struct page *page = pfn_to_online_page(pfn); if (!page) continue; ... } So we got a soft lockup: watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221] CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1 RIP: 0010:pfn_to_online_page+0x5/0xd0 Call Trace: ? kmemleak_scan+0x16a/0x440 kmemleak_write+0x306/0x3a0 ? common_file_perm+0x72/0x170 full_proxy_write+0x5c/0x90 vfs_write+0xb9/0x260 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae I did some tests with the patch. (1) amdgpu module unloaded before the patch: real 0m0.976s user 0m0.000s sys 0m0.968s after the patch: real 0m0.981s user 0m0.000s sys 0m0.973s (2) amdgpu module loaded before the patch: real 0m35.365s user 0m0.000s sys 0m35.354s after the patch: real 0m1.049s user 0m0.000s sys 0m1.042s
In the Linux kernel, the following vulnerability has been resolved: NFSv4: Don't hold the layoutget locks across multiple RPC calls When doing layoutget as part of the open() compound, we have to be careful to release the layout locks before we can call any further RPC calls, such as setattr(). The reason is that those calls could trigger a recall, which could deadlock.
In the Linux kernel, the following vulnerability has been resolved: ath11k: Fix frames flush failure caused by deadlock We are seeing below warnings: kernel: [25393.301506] ath11k_pci 0000:01:00.0: failed to flush mgmt transmit queue 0 kernel: [25398.421509] ath11k_pci 0000:01:00.0: failed to flush mgmt transmit queue 0 kernel: [25398.421831] ath11k_pci 0000:01:00.0: dropping mgmt frame for vdev 0, is_started 0 this means ath11k fails to flush mgmt. frames because wmi_mgmt_tx_work has no chance to run in 5 seconds. By setting /proc/sys/kernel/hung_task_timeout_secs to 20 and increasing ATH11K_FLUSH_TIMEOUT to 50 we get below warnings: kernel: [ 120.763160] INFO: task wpa_supplicant:924 blocked for more than 20 seconds. kernel: [ 120.763169] Not tainted 5.10.90 #12 kernel: [ 120.763177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 120.763186] task:wpa_supplicant state:D stack: 0 pid: 924 ppid: 1 flags:0x000043a0 kernel: [ 120.763201] Call Trace: kernel: [ 120.763214] __schedule+0x785/0x12fa kernel: [ 120.763224] ? lockdep_hardirqs_on_prepare+0xe2/0x1bb kernel: [ 120.763242] schedule+0x7e/0xa1 kernel: [ 120.763253] schedule_timeout+0x98/0xfe kernel: [ 120.763266] ? run_local_timers+0x4a/0x4a kernel: [ 120.763291] ath11k_mac_flush_tx_complete+0x197/0x2b1 [ath11k 13c3a9bf37790f4ac8103b3decf7ab4008ac314a] kernel: [ 120.763306] ? init_wait_entry+0x2e/0x2e kernel: [ 120.763343] __ieee80211_flush_queues+0x167/0x21f [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763378] __ieee80211_recalc_idle+0x105/0x125 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763411] ieee80211_recalc_idle+0x14/0x27 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763441] ieee80211_free_chanctx+0x77/0xa2 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763473] __ieee80211_vif_release_channel+0x100/0x131 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763540] ieee80211_vif_release_channel+0x66/0x81 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763572] ieee80211_destroy_auth_data+0xa3/0xe6 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763612] ieee80211_mgd_deauth+0x178/0x29b [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763654] cfg80211_mlme_deauth+0x1a8/0x22c [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763697] nl80211_deauthenticate+0xfa/0x123 [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763715] genl_rcv_msg+0x392/0x3c2 kernel: [ 120.763750] ? nl80211_associate+0x432/0x432 [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763782] ? nl80211_associate+0x432/0x432 [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763802] ? genl_rcv+0x36/0x36 kernel: [ 120.763814] netlink_rcv_skb+0x89/0xf7 kernel: [ 120.763829] genl_rcv+0x28/0x36 kernel: [ 120.763840] netlink_unicast+0x179/0x24b kernel: [ 120.763854] netlink_sendmsg+0x393/0x401 kernel: [ 120.763872] sock_sendmsg+0x72/0x76 kernel: [ 120.763886] ____sys_sendmsg+0x170/0x1e6 kernel: [ 120.763897] ? copy_msghdr_from_user+0x7a/0xa2 kernel: [ 120.763914] ___sys_sendmsg+0x95/0xd1 kernel: [ 120.763940] __sys_sendmsg+0x85/0xbf kernel: [ 120.763956] do_syscall_64+0x43/0x55 kernel: [ 120.763966] entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: [ 120.763977] RIP: 0033:0x79089f3fcc83 kernel: [ 120.763986] RSP: 002b:00007ffe604f0508 EFLAGS: 00000246 ORIG_RAX: 000000000000002e kernel: [ 120.763997] RAX: ffffffffffffffda RBX: 000059b40e987690 RCX: 000079089f3fcc83 kernel: [ 120.764006] RDX: 0000000000000000 RSI: 00007ffe604f0558 RDI: 0000000000000009 kernel: [ 120.764014] RBP: 00007ffe604f0540 R08: 0000000000000004 R09: 0000000000400000 kernel: [ 120.764023] R10: 00007ffe604f0638 R11: 0000000000000246 R12: 000059b40ea04980 kernel: [ 120.764032] R13: 00007ffe604 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: driver core: fix deadlock in __device_attach In __device_attach function, The lock holding logic is as follows: ... __device_attach device_lock(dev) // get lock dev async_schedule_dev(__device_attach_async_helper, dev); // func async_schedule_node async_schedule_node_domain(func) entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC); /* when fail or work limit, sync to execute func, but __device_attach_async_helper will get lock dev as well, which will lead to A-A deadlock. */ if (!entry || atomic_read(&entry_count) > MAX_WORK) { func; else queue_work_node(node, system_unbound_wq, &entry->work) device_unlock(dev) As shown above, when it is allowed to do async probes, because of out of memory or work limit, async work is not allowed, to do sync execute instead. it will lead to A-A deadlock because of __device_attach_async_helper getting lock dev. To fix the deadlock, move the async_schedule_dev outside device_lock, as we can see, in async_schedule_node_domain, the parameter of queue_work_node is system_unbound_wq, so it can accept concurrent operations. which will also not change the code logic, and will not lead to deadlock.
In the Linux kernel, the following vulnerability has been resolved: drivers: tty: serial: Fix deadlock in sa1100_set_termios() There is a deadlock in sa1100_set_termios(), which is shown below: (Thread 1) | (Thread 2) | sa1100_enable_ms() sa1100_set_termios() | mod_timer() spin_lock_irqsave() //(1) | (wait a time) ... | sa1100_timeout() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold sport->port.lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need sport->port.lock in position (2) of thread 2. As a result, sa1100_set_termios() will block forever. This patch moves del_timer_sync() before spin_lock_irqsave() in order to prevent the deadlock.
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix deadlock between concurrent dio writes when low on free data space When reserving data space for a direct IO write we can end up deadlocking if we have multiple tasks attempting a write to the same file range, there are multiple extents covered by that file range, we are low on available space for data and the writes don't expand the inode's i_size. The deadlock can happen like this: 1) We have a file with an i_size of 1M, at offset 0 it has an extent with a size of 128K and at offset 128K it has another extent also with a size of 128K; 2) Task A does a direct IO write against file range [0, 256K), and because the write is within the i_size boundary, it takes the inode's lock (VFS level) in shared mode; 3) Task A locks the file range [0, 256K) at btrfs_dio_iomap_begin(), and then gets the extent map for the extent covering the range [0, 128K). At btrfs_get_blocks_direct_write(), it creates an ordered extent for that file range ([0, 128K)); 4) Before returning from btrfs_dio_iomap_begin(), it unlocks the file range [0, 256K); 5) Task A executes btrfs_dio_iomap_begin() again, this time for the file range [128K, 256K), and locks the file range [128K, 256K); 6) Task B starts a direct IO write against file range [0, 256K) as well. It also locks the inode in shared mode, as it's within the i_size limit, and then tries to lock file range [0, 256K). It is able to lock the subrange [0, 128K) but then blocks waiting for the range [128K, 256K), as it is currently locked by task A; 7) Task A enters btrfs_get_blocks_direct_write() and tries to reserve data space. Because we are low on available free space, it triggers the async data reclaim task, and waits for it to reserve data space; 8) The async reclaim task decides to wait for all existing ordered extents to complete (through btrfs_wait_ordered_roots()). It finds the ordered extent previously created by task A for the file range [0, 128K) and waits for it to complete; 9) The ordered extent for the file range [0, 128K) can not complete because it blocks at btrfs_finish_ordered_io() when trying to lock the file range [0, 128K). This results in a deadlock, because: - task B is holding the file range [0, 128K) locked, waiting for the range [128K, 256K) to be unlocked by task A; - task A is holding the file range [128K, 256K) locked and it's waiting for the async data reclaim task to satisfy its space reservation request; - the async data reclaim task is waiting for ordered extent [0, 128K) to complete, but the ordered extent can not complete because the file range [0, 128K) is currently locked by task B, which is waiting on task A to unlock file range [128K, 256K) and task A waiting on the async data reclaim task. This results in a deadlock between 4 task: task A, task B, the async data reclaim task and the task doing ordered extent completion (a work queue task). This type of deadlock can sporadically be triggered by the test case generic/300 from fstests, and results in a stack trace like the following: [12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds. [12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 #1 [12084.035562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000 [12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs] [12084.036599] Call Trace: [12084.036601] <TASK> [12084.036606] __schedule+0x3cb/0xed0 [12084.036616] schedule+0x4e/0xb0 [12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs] [12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0 [12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs] [12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs] [12084.0367 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: loop: implement ->free_disk Ensure that the lo_device which is stored in the gendisk private data is valid until the gendisk is freed. Currently the loop driver uses a lot of effort to make sure a device is not freed when it is still in use, but to to fix a potential deadlock this will be relaxed a bit soon.
In the Linux kernel, the following vulnerability has been resolved: scsi: lpfc: Fix SCSI I/O completion and abort handler deadlock During stress I/O tests with 500+ vports, hard LOCKUP call traces are observed. CPU A: native_queued_spin_lock_slowpath+0x192 _raw_spin_lock_irqsave+0x32 lpfc_handle_fcp_err+0x4c6 lpfc_fcp_io_cmd_wqe_cmpl+0x964 lpfc_sli4_fp_handle_cqe+0x266 __lpfc_sli4_process_cq+0x105 __lpfc_sli4_hba_process_cq+0x3c lpfc_cq_poll_hdler+0x16 irq_poll_softirq+0x76 __softirqentry_text_start+0xe4 irq_exit+0xf7 do_IRQ+0x7f CPU B: native_queued_spin_lock_slowpath+0x5b _raw_spin_lock+0x1c lpfc_abort_handler+0x13e scmd_eh_abort_handler+0x85 process_one_work+0x1a7 worker_thread+0x30 kthread+0x112 ret_from_fork+0x1f Diagram of lockup: CPUA CPUB ---- ---- lpfc_cmd->buf_lock phba->hbalock lpfc_cmd->buf_lock phba->hbalock Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock in lpfc_abort_handler routine so that it tries to take the lpfc_cmd->buf_lock first before phba->hbalock.
In the Linux kernel, the following vulnerability has been resolved: btrfs: get rid of warning on transaction commit when using flushoncommit When using the flushoncommit mount option, during almost every transaction commit we trigger a warning from __writeback_inodes_sb_nr(): $ cat fs/fs-writeback.c: (...) static void __writeback_inodes_sb_nr(struct super_block *sb, ... { (...) WARN_ON(!rwsem_is_locked(&sb->s_umount)); (...) } (...) The trace produced in dmesg looks like the following: [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3 [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186 [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3 [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...) [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246 [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000 [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50 [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000 [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488 [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460 [947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000 [947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0 [947.571072] Call Trace: [947.572354] <TASK> [947.573266] btrfs_commit_transaction+0x1f1/0x998 [947.576785] ? start_transaction+0x3ab/0x44e [947.579867] ? schedule_timeout+0x8a/0xdd [947.582716] transaction_kthread+0xe9/0x156 [947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407 [947.590104] kthread+0x131/0x139 [947.592168] ? set_kthread_struct+0x32/0x32 [947.595174] ret_from_fork+0x22/0x30 [947.597561] </TASK> [947.598553] ---[ end trace 644721052755541c ]--- This is because we started using writeback_inodes_sb() to flush delalloc when committing a transaction (when using -o flushoncommit), in order to avoid deadlocks with filesystem freeze operations. This change was made by commit ce8ea7cc6eb313 ("btrfs: don't call btrfs_start_delalloc_roots in flushoncommit"). After that change we started producing that warning, and every now and then a user reports this since the warning happens too often, it spams dmesg/syslog, and a user is unsure if this reflects any problem that might compromise the filesystem's reliability. We can not just lock the sb->s_umount semaphore before calling writeback_inodes_sb(), because that would at least deadlock with filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem() is called while we are holding that semaphore in write mode, and that can trigger a transaction commit, resulting in a deadlock. It would also trigger the same type of deadlock in the unmount path. Possibly, it could also introduce some other locking dependencies that lockdep would report. To fix this call try_to_writeback_inodes_sb() instead of writeback_inodes_sb(), because that will try to read lock sb->s_umount and then will only call writeback_inodes_sb() if it was able to lock it. This is fine because the cases where it can't read lock sb->s_umount are during a filesystem unmount or during a filesystem freeze - in those cases sb->s_umount is write locked and sync_filesystem() is called, which calls writeback_inodes_sb(). In other words, in all cases where we can't take a read lock on sb->s_umount, writeback is already being triggered elsewhere. An alternative would be to call btrfs_start_delalloc_roots() with a number of pages different from LONG_MAX, for example matching the number of delalloc bytes we currently have, in ---truncated---
In the Linux kernel, the following vulnerability has been resolved: bcache: avoid journal no-space deadlock by reserving 1 journal bucket The journal no-space deadlock was reported time to time. Such deadlock can happen in the following situation. When all journal buckets are fully filled by active jset with heavy write I/O load, the cache set registration (after a reboot) will load all active jsets and inserting them into the btree again (which is called journal replay). If a journaled bkey is inserted into a btree node and results btree node split, new journal request might be triggered. For example, the btree grows one more level after the node split, then the root node record in cache device super block will be upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no space in journal buckets, the journal replay has to wait for new journal bucket to be reclaimed after at least one journal bucket replayed. This is one example that how the journal no-space deadlock happens. The solution to avoid the deadlock is to reserve 1 journal bucket in run time, and only permit the reserved journal bucket to be used during cache set registration procedure for things like journal replay. Then the journal space will never be fully filled, there is no chance for journal no-space deadlock to happen anymore. This patch adds a new member "bool do_reserve" in struct journal, it is inititalized to 0 (false) when struct journal is allocated, and set to 1 (true) by bch_journal_space_reserve() when all initialization done in run_cache_set(). In the run time when journal_reclaim() tries to allocate a new journal bucket, free_journal_buckets() is called to check whether there are enough free journal buckets to use. If there is only 1 free journal bucket and journal->do_reserve is 1 (true), the last bucket is reserved and free_journal_buckets() will return 0 to indicate no free journal bucket. Then journal_reclaim() will give up, and try next time to see whetheer there is free journal bucket to allocate. By this method, there is always 1 jouranl bucket reserved in run time. During the cache set registration, journal->do_reserve is 0 (false), so the reserved journal bucket can be used to avoid the no-space deadlock.
In the Linux kernel, the following vulnerability has been resolved: drm/vc4: Fix deadlock on DSI device attach error DSI device attach to DSI host will be done with host device's lock held. Un-registering host in "device attach" error path (ex: probe retry) will result in deadlock with below call trace and non operational DSI display. Startup Call trace: [ 35.043036] rt_mutex_slowlock.constprop.21+0x184/0x1b8 [ 35.043048] mutex_lock_nested+0x7c/0xc8 [ 35.043060] device_del+0x4c/0x3e8 [ 35.043075] device_unregister+0x20/0x40 [ 35.043082] mipi_dsi_remove_device_fn+0x18/0x28 [ 35.043093] device_for_each_child+0x68/0xb0 [ 35.043105] mipi_dsi_host_unregister+0x40/0x90 [ 35.043115] vc4_dsi_host_attach+0xf0/0x120 [vc4] [ 35.043199] mipi_dsi_attach+0x30/0x48 [ 35.043209] tc358762_probe+0x128/0x164 [tc358762] [ 35.043225] mipi_dsi_drv_probe+0x28/0x38 [ 35.043234] really_probe+0xc0/0x318 [ 35.043244] __driver_probe_device+0x80/0xe8 [ 35.043254] driver_probe_device+0xb8/0x118 [ 35.043263] __device_attach_driver+0x98/0xe8 [ 35.043273] bus_for_each_drv+0x84/0xd8 [ 35.043281] __device_attach+0xf0/0x150 [ 35.043290] device_initial_probe+0x1c/0x28 [ 35.043300] bus_probe_device+0xa4/0xb0 [ 35.043308] deferred_probe_work_func+0xa0/0xe0 [ 35.043318] process_one_work+0x254/0x700 [ 35.043330] worker_thread+0x4c/0x448 [ 35.043339] kthread+0x19c/0x1a8 [ 35.043348] ret_from_fork+0x10/0x20 Shutdown Call trace: [ 365.565417] Call trace: [ 365.565423] __switch_to+0x148/0x200 [ 365.565452] __schedule+0x340/0x9c8 [ 365.565467] schedule+0x48/0x110 [ 365.565479] schedule_timeout+0x3b0/0x448 [ 365.565496] wait_for_completion+0xac/0x138 [ 365.565509] __flush_work+0x218/0x4e0 [ 365.565523] flush_work+0x1c/0x28 [ 365.565536] wait_for_device_probe+0x68/0x158 [ 365.565550] device_shutdown+0x24/0x348 [ 365.565561] kernel_restart_prepare+0x40/0x50 [ 365.565578] kernel_restart+0x20/0x70 [ 365.565591] __do_sys_reboot+0x10c/0x220 [ 365.565605] __arm64_sys_reboot+0x2c/0x38 [ 365.565619] invoke_syscall+0x4c/0x110 [ 365.565634] el0_svc_common.constprop.3+0xfc/0x120 [ 365.565648] do_el0_svc+0x2c/0x90 [ 365.565661] el0_svc+0x4c/0xf0 [ 365.565671] el0t_64_sync_handler+0x90/0xb8 [ 365.565682] el0t_64_sync+0x180/0x184
In the Linux kernel, the following vulnerability has been resolved: drivers: staging: rtl8192eu: Fix deadlock in rtw_joinbss_event_prehandle There is a deadlock in rtw_joinbss_event_prehandle(), which is shown below: (Thread 1) | (Thread 2) | _set_timer() rtw_joinbss_event_prehandle()| mod_timer() spin_lock_bh() //(1) | (wait a time) ... | rtw_join_timeout_handler() | _rtw_join_timeout_handler() del_timer_sync() | spin_lock_bh() //(2) (wait timer to stop) | ... We hold pmlmepriv->lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need pmlmepriv->lock in position (2) of thread 2. As a result, rtw_joinbss_event_prehandle() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_bh(), which could let timer handler to obtain the needed lock. What`s more, we change spin_lock_bh() to spin_lock_irq() in _rtw_join_timeout_handler() in order to prevent deadlock.
In the Linux kernel, the following vulnerability has been resolved: drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback() There is a deadlock in rtw_surveydone_event_callback(), which is shown below: (Thread 1) | (Thread 2) | _set_timer() rtw_surveydone_event_callback()| mod_timer() spin_lock_bh() //(1) | (wait a time) ... | rtw_scan_timeout_handler() del_timer_sync() | spin_lock_bh() //(2) (wait timer to stop) | ... We hold pmlmepriv->lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need pmlmepriv->lock in position (2) of thread 2. As a result, rtw_surveydone_event_callback() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_bh(), which could let timer handler to obtain the needed lock. What`s more, we change spin_lock_bh() in rtw_scan_timeout_handler() to spin_lock_irq(). Otherwise, spin_lock_bh() will also cause deadlock() in timer handler.
In the Linux kernel, the following vulnerability has been resolved: drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop() There is a deadlock in ieee80211_beacons_stop(), which is shown below: (Thread 1) | (Thread 2) | ieee80211_send_beacon() ieee80211_beacons_stop() | mod_timer() spin_lock_irqsave() //(1) | (wait a time) ... | ieee80211_send_beacon_cb() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold ieee->beacon_lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need ieee->beacon_lock in position (2) of thread 2. As a result, ieee80211_beacons_stop() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_irqsave(), which could let timer handler to obtain the needed lock.
In the Linux kernel, the following vulnerability has been resolved: drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop() There is a deadlock in rtllib_beacons_stop(), which is shown below: (Thread 1) | (Thread 2) | rtllib_send_beacon() rtllib_beacons_stop() | mod_timer() spin_lock_irqsave() //(1) | (wait a time) ... | rtllib_send_beacon_cb() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold ieee->beacon_lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need ieee->beacon_lock in position (2) of thread 2. As a result, rtllib_beacons_stop() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_irqsave(), which could let timer handler to obtain the needed lock.
In the Linux kernel, the following vulnerability has been resolved: block: Fix potential deadlock in blk_ia_range_sysfs_show() When being read, a sysfs attribute is already protected against removal with the kobject node active reference counter. As a result, in blk_ia_range_sysfs_show(), there is no need to take the queue sysfs lock when reading the value of a range attribute. Using the queue sysfs lock in this function creates a potential deadlock situation with the disk removal, something that a lockdep signals with a splat when the device is removed: [ 760.703551] Possible unsafe locking scenario: [ 760.703551] [ 760.703554] CPU0 CPU1 [ 760.703556] ---- ---- [ 760.703558] lock(&q->sysfs_lock); [ 760.703565] lock(kn->active#385); [ 760.703573] lock(&q->sysfs_lock); [ 760.703579] lock(kn->active#385); [ 760.703587] [ 760.703587] *** DEADLOCK *** Solve this by removing the mutex_lock()/mutex_unlock() calls from blk_ia_range_sysfs_show().
In the Linux kernel, the following vulnerability has been resolved: scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg() In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard lockup call trace hangs the system. Call Trace: _raw_spin_lock_irqsave+0x32/0x40 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc] lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc] lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc] lpfc_els_flush_cmd+0x43c/0x670 [lpfc] lpfc_els_flush_all_cmd+0x37/0x60 [lpfc] lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc] lpfc_do_work+0x1485/0x1d70 [lpfc] kthread+0x112/0x130 ret_from_fork+0x1f/0x40 Kernel panic - not syncing: Hard LOCKUP The same CPU tries to claim the phba->port_list_lock twice. Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and lpfc_printf_log() macros before calling lpfc_dmp_dbg(). There is no need to take the phba->port_list_lock within lpfc_dmp_dbg().
In the Linux kernel, the following vulnerability has been resolved: mm: vmscan: remove deadlock due to throttling failing to make progress A soft lockup bug in kcompactd was reported in a private bugzilla with the following visible in dmesg; watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479] watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479] watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479] watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479] The machine had 256G of RAM with no swap and an earlier failed allocation indicated that node 0 where kcompactd was run was potentially unreclaimable; Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes Vlastimil Babka investigated a crash dump and found that a task migrating pages was trying to drain PCP lists; PID: 52922 TASK: ffff969f820e5000 CPU: 19 COMMAND: "kworker/u128:3" Call Trace: __schedule schedule schedule_timeout wait_for_completion __flush_work __drain_all_pages __alloc_pages_slowpath.constprop.114 __alloc_pages alloc_migration_target migrate_pages migrate_to_node do_migrate_pages cpuset_migrate_mm_workfn process_one_work worker_thread kthread ret_from_fork This failure is specific to CONFIG_PREEMPT=n builds. The root of the problem is that kcompact0 is not rescheduling on a CPU while a task that has isolated a large number of the pages from the LRU is waiting on kcompact0 to reschedule so the pages can be released. While shrink_inactive_list() only loops once around too_many_isolated, reclaim can continue without rescheduling if sc->skipped_deactivate == 1 which could happen if there was no file LRU and the inactive anon list was not low.
In the Linux kernel, the following vulnerability has been resolved: drivers: usb: host: Fix deadlock in oxu_bus_suspend() There is a deadlock in oxu_bus_suspend(), which is shown below: (Thread 1) | (Thread 2) | timer_action() oxu_bus_suspend() | mod_timer() spin_lock_irq() //(1) | (wait a time) ... | oxu_watchdog() del_timer_sync() | spin_lock_irq() //(2) (wait timer to stop) | ... We hold oxu->lock in position (1) of thread 1, and use del_timer_sync() to wait timer to stop, but timer handler also need oxu->lock in position (2) of thread 2. As a result, oxu_bus_suspend() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_irq(), which could let timer handler to obtain the needed lock.
In the Linux kernel, the following vulnerability has been resolved: RDMA/ib_srp: Fix a deadlock Remove the flush_workqueue(system_long_wq) call since flushing system_long_wq is deadlock-prone and since that call is redundant with a preceding cancel_work_sync()
In the Linux kernel, the following vulnerability has been resolved: drivers: staging: rtl8192bs: Fix deadlock in rtw_joinbss_event_prehandle() There is a deadlock in rtw_joinbss_event_prehandle(), which is shown below: (Thread 1) | (Thread 2) | _set_timer() rtw_joinbss_event_prehandle()| mod_timer() spin_lock_bh() //(1) | (wait a time) ... | _rtw_join_timeout_handler() del_timer_sync() | spin_lock_bh() //(2) (wait timer to stop) | ... We hold pmlmepriv->lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need pmlmepriv->lock in position (2) of thread 2. As a result, rtw_joinbss_event_prehandle() will block forever. This patch extracts del_timer_sync() from the protection of spin_lock_bh(), which could let timer handler to obtain the needed lock. What`s more, we change spin_lock_bh() to spin_lock_irq() in _rtw_join_timeout_handler() in order to prevent deadlock.
In the Linux kernel, the following vulnerability has been resolved: cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that cpuset_attach() is also called from cgroup_attach_task_all(). Add cpus_read_lock() like what cgroup_procs_write_start() does.
In the Linux kernel, the following vulnerability has been resolved: IB/core: Fix a nested dead lock as part of ODP flow Fix a nested dead lock as part of ODP flow by using mmput_async(). From the below call trace [1] can see that calling mmput() once we have the umem_odp->umem_mutex locked as required by ib_umem_odp_map_dma_and_lock() might trigger in the same task the exit_mmap()->__mmu_notifier_release()->mlx5_ib_invalidate_range() which may dead lock when trying to lock the same mutex. Moving to use mmput_async() will solve the problem as the above exit_mmap() flow will be called in other task and will be executed once the lock will be available. [1] [64843.077665] task:kworker/u133:2 state:D stack: 0 pid:80906 ppid: 2 flags:0x00004000 [64843.077672] Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib] [64843.077719] Call Trace: [64843.077722] <TASK> [64843.077724] __schedule+0x23d/0x590 [64843.077729] schedule+0x4e/0xb0 [64843.077735] schedule_preempt_disabled+0xe/0x10 [64843.077740] __mutex_lock.constprop.0+0x263/0x490 [64843.077747] __mutex_lock_slowpath+0x13/0x20 [64843.077752] mutex_lock+0x34/0x40 [64843.077758] mlx5_ib_invalidate_range+0x48/0x270 [mlx5_ib] [64843.077808] __mmu_notifier_release+0x1a4/0x200 [64843.077816] exit_mmap+0x1bc/0x200 [64843.077822] ? walk_page_range+0x9c/0x120 [64843.077828] ? __cond_resched+0x1a/0x50 [64843.077833] ? mutex_lock+0x13/0x40 [64843.077839] ? uprobe_clear_state+0xac/0x120 [64843.077860] mmput+0x5f/0x140 [64843.077867] ib_umem_odp_map_dma_and_lock+0x21b/0x580 [ib_core] [64843.077931] pagefault_real_mr+0x9a/0x140 [mlx5_ib] [64843.077962] pagefault_mr+0xb4/0x550 [mlx5_ib] [64843.077992] pagefault_single_data_segment.constprop.0+0x2ac/0x560 [mlx5_ib] [64843.078022] mlx5_ib_eqe_pf_action+0x528/0x780 [mlx5_ib] [64843.078051] process_one_work+0x22b/0x3d0 [64843.078059] worker_thread+0x53/0x410 [64843.078065] ? process_one_work+0x3d0/0x3d0 [64843.078073] kthread+0x12a/0x150 [64843.078079] ? set_kthread_struct+0x50/0x50 [64843.078085] ret_from_fork+0x22/0x30 [64843.078093] </TASK>
In the Linux kernel, the following vulnerability has been resolved: tcp: make sure init the accept_queue's spinlocks once When I run syz's reproduction C program locally, it causes the following issue: pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0! WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7 30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900 RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000 R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000 FS: 00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0 Call Trace: <IRQ> _raw_spin_unlock (kernel/locking/spinlock.c:186) inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321) inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358) tcp_check_req (net/ipv4/tcp_minisocks.c:868) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205) ip_local_deliver_finish (net/ipv4/ip_input.c:234) __netif_receive_skb_one_core (net/core/dev.c:5529) process_backlog (./include/linux/rcupdate.h:779) __napi_poll (net/core/dev.c:6533) net_rx_action (net/core/dev.c:6604) __do_softirq (./arch/x86/include/asm/jump_label.h:27) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4374) ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469) tcp_rcv_state_process (net/ipv4/tcp_input.c:6657) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929) __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968) release_sock (net/core/sock.c:3536) inet_wait_for_connect (net/ipv4/af_inet.c:609) __inet_stream_connect (net/ipv4/af_inet.c:702) inet_stream_connect (net/ipv4/af_inet.c:748) __sys_connect (./include/linux/file.h:45 net/socket.c:2064) __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7fa10ff05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640 R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20 </TASK> The issue triggering process is analyzed as follows: Thread A Thread B tcp_v4_rcv //receive ack TCP packet inet_shutdown tcp_check_req tcp_disconnect //disconnect sock ... tcp_set_state(sk, TCP_CLOSE) inet_csk_complete_hashdance ... inet_csk_reqsk_queue_add ---truncated---
In the Linux kernel, the following vulnerability has been resolved: mptcp: fix possible deadlock in subflow diag Syzbot and Eric reported a lockdep splat in the subflow diag: WARNING: possible circular locking dependency detected 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted syz-executor.2/24141 is trying to acquire lock: ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 but task is already holding lock: ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&h->lhash2[i].lock){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743 inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261 __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217 inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239 rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316 rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577 ops_init+0x352/0x610 net/core/net_namespace.c:136 __register_pernet_operations net/core/net_namespace.c:1214 [inline] register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283 register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370 rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735 do_one_initcall+0x238/0x830 init/main.c:1236 do_initcall_level+0x157/0x210 init/main.c:1298 do_initcalls+0x3f/0x80 init/main.c:1314 kernel_init_freeable+0x42f/0x5d0 init/main.c:1551 kernel_init+0x1d/0x2a0 init/main.c:1441 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 -> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 lock_sock_fast include/net/sock.h:1723 [inline] subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28 tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345 inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061 __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263 inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371 netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405 sock_diag_rcv_msg+0xe7/0x410 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 As noted by Eric we can break the lock dependency chain avoid dumping ---truncated---
In the Linux kernel, the following vulnerability has been resolved: dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape For raid456, if reshape is still in progress, then IO across reshape position will wait for reshape to make progress. However, for dm-raid, in following cases reshape will never make progress hence IO will hang: 1) the array is read-only; 2) MD_RECOVERY_WAIT is set; 3) MD_RECOVERY_FROZEN is set; After commit c467e97f079f ("md/raid6: use valid sector values to determine if an I/O should wait on the reshape") fix the problem that IO across reshape position doesn't wait for reshape, the dm-raid test shell/lvconvert-raid-reshape.sh start to hang: [root@fedora ~]# cat /proc/979/stack [<0>] wait_woken+0x7d/0x90 [<0>] raid5_make_request+0x929/0x1d70 [raid456] [<0>] md_handle_request+0xc2/0x3b0 [md_mod] [<0>] raid_map+0x2c/0x50 [dm_raid] [<0>] __map_bio+0x251/0x380 [dm_mod] [<0>] dm_submit_bio+0x1f0/0x760 [dm_mod] [<0>] __submit_bio+0xc2/0x1c0 [<0>] submit_bio_noacct_nocheck+0x17f/0x450 [<0>] submit_bio_noacct+0x2bc/0x780 [<0>] submit_bio+0x70/0xc0 [<0>] mpage_readahead+0x169/0x1f0 [<0>] blkdev_readahead+0x18/0x30 [<0>] read_pages+0x7c/0x3b0 [<0>] page_cache_ra_unbounded+0x1ab/0x280 [<0>] force_page_cache_ra+0x9e/0x130 [<0>] page_cache_sync_ra+0x3b/0x110 [<0>] filemap_get_pages+0x143/0xa30 [<0>] filemap_read+0xdc/0x4b0 [<0>] blkdev_read_iter+0x75/0x200 [<0>] vfs_read+0x272/0x460 [<0>] ksys_read+0x7a/0x170 [<0>] __x64_sys_read+0x1c/0x30 [<0>] do_syscall_64+0xc6/0x230 [<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74 This is because reshape can't make progress. For md/raid, the problem doesn't exist because register new sync_thread doesn't rely on the IO to be done any more: 1) If array is read-only, it can switch to read-write by ioctl/sysfs; 2) md/raid never set MD_RECOVERY_WAIT; 3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold 'reconfig_mutex', hence it can be cleared and reshape can continue by sysfs api 'sync_action'. However, I'm not sure yet how to avoid the problem in dm-raid yet. This patch on the one hand make sure raid_message() can't change sync_thread() through raid_message() after presuspend(), on the other hand detect the above 3 cases before wait for IO do be done in dm_suspend(), and let dm-raid requeue those IO.
In the Linux kernel, the following vulnerability has been resolved: smb: client: fix deadlock in smb2_find_smb_tcon() Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such deadlock.
Guests can trigger deadlock in Linux netback driver T[his CNA information record relates to multiple CVEs; the text explains which aspects/vulnerabilities correspond to which CVE.] The patch for XSA-392 introduced another issue which might result in a deadlock when trying to free the SKB of a packet dropped due to the XSA-392 handling (CVE-2022-42328). Additionally when dropping packages for other reasons the same deadlock could occur in case of netpoll being active for the interface the xen-netback driver is connected to (CVE-2022-42329).
Guests can trigger deadlock in Linux netback driver T[his CNA information record relates to multiple CVEs; the text explains which aspects/vulnerabilities correspond to which CVE.] The patch for XSA-392 introduced another issue which might result in a deadlock when trying to free the SKB of a packet dropped due to the XSA-392 handling (CVE-2022-42328). Additionally when dropping packages for other reasons the same deadlock could occur in case of netpoll being active for the interface the xen-netback driver is connected to (CVE-2022-42329).
In the Linux kernel, the following vulnerability has been resolved: netrom: fix possible dead-lock in nr_rt_ioctl() syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1] Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node) [1] WARNING: possible circular locking dependency detected 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted ------------------------------------------------------ syz-executor350/5129 is trying to acquire lock: ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 but task is already holding lock: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (nr_node_list_lock){+...}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_remove_node net/netrom/nr_route.c:299 [inline] nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355 nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&nr_node->node_lock){+...}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nr_node_list_lock); lock(&nr_node->node_lock); lock(nr_node_list_lock); lock(&nr_node->node_lock); *** DEADLOCK *** 1 lock held by syz-executor350/5129: #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] #0: ffffffff8f70 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: eth: sungem: remove .ndo_poll_controller to avoid deadlocks Erhard reports netpoll warnings from sungem: netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (gem_start_xmit+0x0/0x398) WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c gem_poll_controller() disables interrupts, which may sleep. We can't sleep in netpoll, it has interrupts disabled completely. Strangely, gem_poll_controller() doesn't even poll the completions, and instead acts as if an interrupt has fired so it just schedules NAPI and exits. None of this has been necessary for years, since netpoll invokes NAPI directly.
In the Linux kernel, the following vulnerability has been resolved: ALSA: Fix deadlocks with kctl removals at disconnection In snd_card_disconnect(), we set card->shutdown flag at the beginning, call callbacks and do sync for card->power_ref_sleep waiters at the end. The callback may delete a kctl element, and this can lead to a deadlock when the device was in the suspended state. Namely: * A process waits for the power up at snd_power_ref_and_wait() in snd_ctl_info() or read/write() inside card->controls_rwsem. * The system gets disconnected meanwhile, and the driver tries to delete a kctl via snd_ctl_remove*(); it tries to take card->controls_rwsem again, but this is already locked by the above. Since the sleeper isn't woken up, this deadlocks. An easy fix is to wake up sleepers before processing the driver disconnect callbacks but right after setting the card->shutdown flag. Then all sleepers will abort immediately, and the code flows again. So, basically this patch moves the wait_event() call at the right timing. While we're at it, just to be sure, call wait_event_all() instead of wait_event(), although we don't use exclusive events on this queue for now.
In the Linux kernel, the following vulnerability has been resolved: wifi: iwlwifi: Use request_module_nowait This appears to work around a deadlock regression that came in with the LED merge in 6.9. The deadlock happens on my system with 24 iwlwifi radios, so maybe it something like all worker threads are busy and some work that needs to complete cannot complete. [also remove unnecessary "load_module" var and now-wrong comment]
In the Linux kernel, the following vulnerability has been resolved: Reapply "drm/qxl: simplify qxl_fence_wait" This reverts commit 07ed11afb68d94eadd4ffc082b97c2331307c5ea. Stephen Rostedt reports: "I went to run my tests on my VMs and the tests hung on boot up. Unfortunately, the most I ever got out was: [ 93.607888] Testing event system initcall: OK [ 93.667730] Running tests on all trace events: [ 93.669757] Testing all events: OK [ 95.631064] ------------[ cut here ]------------ Timed out after 60 seconds" and further debugging points to a possible circular locking dependency between the console_owner locking and the worker pool locking. Reverting the commit allows Steve's VM to boot to completion again. [ This may obviously result in the "[TTM] Buffer eviction failed" messages again, which was the reason for that original revert. But at this point this seems preferable to a non-booting system... ]
In the Linux kernel, the following vulnerability has been resolved: mm: use memalloc_nofs_save() in page_cache_ra_order() See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio. INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
In the Linux kernel, the following vulnerability has been resolved: HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up The flag I2C_HID_READ_PENDING is used to serialize I2C operations. However, this is not necessary, because I2C core already has its own locking for that. More importantly, this flag can cause a lock-up: if the flag is set in i2c_hid_xfer() and an interrupt happens, the interrupt handler (i2c_hid_irq) will check this flag and return immediately without doing anything, then the interrupt handler will be invoked again in an infinite loop. Since interrupt handler is an RT task, it takes over the CPU and the flag-clearing task never gets scheduled, thus we have a lock-up. Delete this unnecessary flag.
In the Linux kernel, the following vulnerability has been resolved: smb3: missing lock when picking channel Coverity spotted a place where we should have been holding the channel lock when accessing the ses channel index. Addresses-Coverity: 1582039 ("Data race condition (MISSING_LOCK)")
In the Linux kernel, the following vulnerability has been resolved: nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock. In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail. Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting. Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device.