In the Linux kernel, the following vulnerability has been resolved: drm/xe/preempt_fence: enlarge the fence critical section It is really easy to introduce subtle deadlocks in preempt_fence_work_func() since we operate on single global ordered-wq for signalling our preempt fences behind the scenes, so even though we signal a particular fence, everything in the callback should be in the fence critical section, since blocking in the callback will prevent other published fences from signalling. If we enlarge the fence critical section to cover the entire callback, then lockdep should be able to understand this better, and complain if we grab a sensitive lock like vm->lock, which is also held when waiting on preempt fences.
In the Linux kernel, the following vulnerability has been resolved: net: hns3: fix a deadlock problem when config TC during resetting When config TC during the reset process, may cause a deadlock, the flow is as below: pf reset start │ ▼ ...... setup tc │ │ ▼ ▼ DOWN: napi_disable() napi_disable()(skip) │ │ │ ▼ ▼ ...... ...... │ │ ▼ │ napi_enable() │ ▼ UINIT: netif_napi_del() │ ▼ ...... │ ▼ INIT: netif_napi_add() │ ▼ ...... global reset start │ │ ▼ ▼ UP: napi_enable()(skip) ...... │ │ ▼ ▼ ...... napi_disable() In reset process, the driver will DOWN the port and then UINIT, in this case, the setup tc process will UP the port before UINIT, so cause the problem. Adds a DOWN process in UINIT to fix it.
In the Linux kernel, the following vulnerability has been resolved: mptcp: make fallback action and fallback decision atomic Syzkaller reported the following splat: WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 __mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 check_fully_established net/mptcp/options.c:982 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Modules linked in: CPU: 1 UID: 0 PID: 7704 Comm: syz.3.1419 Not tainted 6.16.0-rc3-gbd5ce2324dba #20 PREEMPT(voluntary) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:__mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] RIP: 0010:mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] RIP: 0010:check_fully_established net/mptcp/options.c:982 [inline] RIP: 0010:mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Code: 24 18 e8 bb 2a 00 fd e9 1b df ff ff e8 b1 21 0f 00 e8 ec 5f c4 fc 44 0f b7 ac 24 b0 00 00 00 e9 54 f1 ff ff e8 d9 5f c4 fc 90 <0f> 0b 90 e9 b8 f4 ff ff e8 8b 2a 00 fd e9 8d e6 ff ff e8 81 2a 00 RSP: 0018:ffff8880a3f08448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880180a8000 RCX: ffffffff84afcf45 RDX: ffff888090223700 RSI: ffffffff84afdaa7 RDI: 0000000000000001 RBP: ffff888017955780 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880180a8910 R14: ffff8880a3e9d058 R15: 0000000000000000 FS: 00005555791b8500(0000) GS:ffff88811c495000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c2800b7 CR3: 0000000058e44000 CR4: 0000000000350ef0 Call Trace: <IRQ> tcp_reset+0x26f/0x2b0 net/ipv4/tcp_input.c:4432 tcp_validate_incoming+0x1057/0x1b60 net/ipv4/tcp_input.c:5975 tcp_rcv_established+0x5b5/0x21f0 net/ipv4/tcp_input.c:6166 tcp_v4_do_rcv+0x5dc/0xa70 net/ipv4/tcp_ipv4.c:1925 tcp_v4_rcv+0x3473/0x44a0 net/ipv4/tcp_ipv4.c:2363 ip_protocol_deliver_rcu+0xba/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2f1/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_local_deliver+0x1be/0x560 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:469 [inline] ip_rcv_finish net/ipv4/ip_input.c:447 [inline] NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_rcv+0x514/0x810 net/ipv4/ip_input.c:567 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5975 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6088 process_backlog+0x301/0x1360 net/core/dev.c:6440 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7453 napi_poll net/core/dev.c:7517 [inline] net_rx_action+0xb44/0x1010 net/core/dev.c:7644 handle_softirqs+0x1d0/0x770 kernel/softirq.c:579 do_softirq+0x3f/0x90 kernel/softirq.c:480 </IRQ> <TASK> __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:407 local_bh_enable include/linux/bottom_half.h:33 [inline] inet_csk_listen_stop+0x2c5/0x1070 net/ipv4/inet_connection_sock.c:1524 mptcp_check_listen_stop.part.0+0x1cc/0x220 net/mptcp/protocol.c:2985 mptcp_check_listen_stop net/mptcp/mib.h:118 [inline] __mptcp_close+0x9b9/0xbd0 net/mptcp/protocol.c:3000 mptcp_close+0x2f/0x140 net/mptcp/protocol.c:3066 inet_release+0xed/0x200 net/ipv4/af_inet.c:435 inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:487 __sock_release+0xb3/0x270 net/socket.c:649 sock_close+0x1c/0x30 net/socket.c:1439 __fput+0x402/0xb70 fs/file_table.c:465 task_work_run+0x150/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xd4 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: cfg80211: fix management registrations locking The management registrations locking was broken, the list was locked for each wdev, but cfg80211_mgmt_registrations_update() iterated it without holding all the correct spinlocks, causing list corruption. Rather than trying to fix it with fine-grained locking, just move the lock to the wiphy/rdev (still need the list on each wdev), we already need to hold the wdev lock to change it, so there's no contention on the lock in any case. This trivially fixes the bug since we hold one wdev's lock already, and now will hold the lock that protects all lists.
In the Linux kernel, the following vulnerability has been resolved: net: systemport: Add global locking for descriptor lifecycle The descriptor list is a shared resource across all of the transmit queues, and the locking mechanism used today only protects concurrency across a given transmit queue between the transmit and reclaiming. This creates an opportunity for the SYSTEMPORT hardware to work on corrupted descriptors if we have multiple producers at once which is the case when using multiple transmit queues. This was particularly noticeable when using multiple flows/transmit queues and it showed up in interesting ways in that UDP packets would get a correct UDP header checksum being calculated over an incorrect packet length. Similarly TCP packets would get an equally correct checksum computed by the hardware over an incorrect packet length. The SYSTEMPORT hardware maintains an internal descriptor list that it re-arranges when the driver produces a new descriptor anytime it writes to the WRITE_PORT_{HI,LO} registers, there is however some delay in the hardware to re-organize its descriptors and it is possible that concurrent TX queues eventually break this internal allocation scheme to the point where the length/status part of the descriptor gets used for an incorrect data buffer. The fix is to impose a global serialization for all TX queues in the short section where we are writing to the WRITE_PORT_{HI,LO} registers which solves the corruption even with multiple concurrent TX queues being used.
In the Linux kernel, the following vulnerability has been resolved: ubifs: Fix deadlock in concurrent rename whiteout and inode writeback Following hung tasks: [ 77.028764] task:kworker/u8:4 state:D stack: 0 pid: 132 [ 77.028820] Call Trace: [ 77.029027] schedule+0x8c/0x1b0 [ 77.029067] mutex_lock+0x50/0x60 [ 77.029074] ubifs_write_inode+0x68/0x1f0 [ubifs] [ 77.029117] __writeback_single_inode+0x43c/0x570 [ 77.029128] writeback_sb_inodes+0x259/0x740 [ 77.029148] wb_writeback+0x107/0x4d0 [ 77.029163] wb_workfn+0x162/0x7b0 [ 92.390442] task:aa state:D stack: 0 pid: 1506 [ 92.390448] Call Trace: [ 92.390458] schedule+0x8c/0x1b0 [ 92.390461] wb_wait_for_completion+0x82/0xd0 [ 92.390469] __writeback_inodes_sb_nr+0xb2/0x110 [ 92.390472] writeback_inodes_sb_nr+0x14/0x20 [ 92.390476] ubifs_budget_space+0x705/0xdd0 [ubifs] [ 92.390503] do_rename.cold+0x7f/0x187 [ubifs] [ 92.390549] ubifs_rename+0x8b/0x180 [ubifs] [ 92.390571] vfs_rename+0xdb2/0x1170 [ 92.390580] do_renameat2+0x554/0x770 , are caused by concurrent rename whiteout and inode writeback processes: rename_whiteout(Thread 1) wb_workfn(Thread2) ubifs_rename do_rename lock_4_inodes (Hold ui_mutex) ubifs_budget_space make_free_space shrink_liability __writeback_inodes_sb_nr bdi_split_work_to_wbs (Queue new wb work) wb_do_writeback(wb work) __writeback_single_inode ubifs_write_inode LOCK(ui_mutex) ↑ wb_wait_for_completion (Wait wb work) <-- deadlock! Reproducer (Detail program in [Link]): 1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT) 2. Consume out of space before kernel(mdelay) doing budget for whiteout Fix it by doing whiteout space budget before locking ubifs inodes. BTW, it also fixes wrong goto tag 'out_release' in whiteout budget error handling path(It should at least recover dir i_size and unlock 4 ubifs inodes).
In the Linux kernel, the following vulnerability has been resolved: scsi: ufs: Fix a deadlock in the error handler The following deadlock has been observed on a test setup: - All tags allocated - The SCSI error handler calls ufshcd_eh_host_reset_handler() - ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() - ufshcd_err_handler() locks up as follows: Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace: __switch_to+0x298/0x5d8 __schedule+0x6cc/0xa94 schedule+0x12c/0x298 blk_mq_get_tag+0x210/0x480 __blk_mq_alloc_request+0x1c8/0x284 blk_get_request+0x74/0x134 ufshcd_exec_dev_cmd+0x68/0x640 ufshcd_verify_dev_init+0x68/0x35c ufshcd_probe_hba+0x12c/0x1cb8 ufshcd_host_reset_and_restore+0x88/0x254 ufshcd_reset_and_restore+0xd0/0x354 ufshcd_err_handler+0x408/0xc58 process_one_work+0x24c/0x66c worker_thread+0x3e8/0xa4c kthread+0x150/0x1b4 ret_from_fork+0x10/0x30 Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request.
In the Linux kernel, the following vulnerability has been resolved: mptcp: fix deadlock in __mptcp_push_pending() __mptcp_push_pending() may call mptcp_flush_join_list() with subflow socket lock held. If such call hits mptcp_sockopt_sync_all() then subsequently __mptcp_sockopt_sync() could try to lock the subflow socket for itself, causing a deadlock. sysrq: Show Blocked State task:ss-server state:D stack: 0 pid: 938 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x2d6/0x10c0 ? __mod_memcg_state+0x4d/0x70 ? csum_partial+0xd/0x20 ? _raw_spin_lock_irqsave+0x26/0x50 schedule+0x4e/0xc0 __lock_sock+0x69/0x90 ? do_wait_intr_irq+0xa0/0xa0 __lock_sock_fast+0x35/0x50 mptcp_sockopt_sync_all+0x38/0xc0 __mptcp_push_pending+0x105/0x200 mptcp_sendmsg+0x466/0x490 sock_sendmsg+0x57/0x60 __sys_sendto+0xf0/0x160 ? do_wait_intr_irq+0xa0/0xa0 ? fpregs_restore_userregs+0x12/0xd0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9ba546c2d0 RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0 RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234 RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060 R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8 </TASK> Fix the issue by using __mptcp_flush_join_list() instead of plain mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by Florian. The sockopt sync will be deferred to the workqueue.
In the Linux kernel, the following vulnerability has been resolved: soc/tegra: regulators: Fix locking up when voltage-spread is out of range Fix voltage coupler lockup which happens when voltage-spread is out of range due to a bug in the code. The max-spread requirement shall be accounted when CPU regulator doesn't have consumers. This problem is observed on Tegra30 Ouya game console once system-wide DVFS is enabled in a device-tree.
In the Linux kernel, the following vulnerability has been resolved: nvmet-tcp: fix incorrect locking in state_change sk callback We are not changing anything in the TCP connection state so we should not take a write_lock but rather a read lock. This caused a deadlock when running nvmet-tcp and nvme-tcp on the same system, where state_change callbacks on the host and on the controller side have causal relationship and made lockdep report on this with blktests: ================================ WARNING: inconsistent lock state 5.12.0-rc3 #1 Tainted: G I -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage. nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp] {IN-SOFTIRQ-W} state was registered at: __lock_acquire+0x79b/0x18d0 lock_acquire+0x1ca/0x480 _raw_write_lock_bh+0x39/0x80 nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp] tcp_fin+0x2a8/0x780 tcp_data_queue+0xf94/0x1f20 tcp_rcv_established+0x6ba/0x1f00 tcp_v4_do_rcv+0x502/0x760 tcp_v4_rcv+0x257e/0x3430 ip_protocol_deliver_rcu+0x69/0x6a0 ip_local_deliver_finish+0x1e2/0x2f0 ip_local_deliver+0x1a2/0x420 ip_rcv+0x4fb/0x6b0 __netif_receive_skb_one_core+0x162/0x1b0 process_backlog+0x1ff/0x770 __napi_poll.constprop.0+0xa9/0x5c0 net_rx_action+0x7b3/0xb30 __do_softirq+0x1f0/0x940 do_softirq+0xa1/0xd0 __local_bh_enable_ip+0xd8/0x100 ip_finish_output2+0x6b7/0x18a0 __ip_queue_xmit+0x706/0x1aa0 __tcp_transmit_skb+0x2068/0x2e20 tcp_write_xmit+0xc9e/0x2bb0 __tcp_push_pending_frames+0x92/0x310 inet_shutdown+0x158/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae irq event stamp: 10687 hardirqs last enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40 hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90 softirqs last enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940 softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(clock-AF_INET); <Interrupt> lock(clock-AF_INET); *** DEADLOCK *** 5 locks held by nvme/1324: #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp] #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300 stack backtrace: CPU: 26 PID: 1324 Comm: nvme Tainted: G I 5.12.0-rc3 #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Call Trace: dump_stack+0x93/0xc2 mark_lock_irq.cold+0x2c/0xb3 ? verify_lock_unused+0x390/0x390 ? stack_trace_consume_entry+0x160/0x160 ? lock_downgrade+0x100/0x100 ? save_trace+0x88/0x5e0 ? _raw_spin_unlock_irqrestore+0x2d/0x40 mark_lock+0x530/0x1470 ? mark_lock_irq+0x1d10/0x1d10 ? enqueue_timer+0x660/0x660 mark_usage+0x215/0x2a0 __lock_acquire+0x79b/0x18d0 ? tcp_schedule_loss_probe.part.0+0x38c/0x520 lock_acquire+0x1ca/0x480 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? rcu_read_unlock+0x40/0x40 ? tcp_mtu_probe+0x1ae0/0x1ae0 ? kmalloc_reserve+0xa0/0xa0 ? sysfs_file_ops+0x170/0x170 _raw_read_lock+0x3d/0xa0 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? sysfs_file_ops ---truncated---
In the Linux kernel, the following vulnerability has been resolved: mtd: require write permissions for locking and badblock ioctls MEMLOCK, MEMUNLOCK and OTPLOCK modify protection bits. Thus require write permission. Depending on the hardware MEMLOCK might even be write-once, e.g. for SPI-NOR flashes with their WP# tied to GND. OTPLOCK is always write-once. MEMSETBADBLOCK modifies the bad block table.
In the Linux kernel, the following vulnerability has been resolved: mac80211: fix locking in ieee80211_start_ap error path We need to hold the local->mtx to release the channel context, as even encoded by the lockdep_assert_held() there. Fix it.
In the Linux kernel, the following vulnerability has been resolved: mac80211: fix deadlock in AP/VLAN handling Syzbot reports that when you have AP_VLAN interfaces that are up and close the AP interface they belong to, we get a deadlock. No surprise - since we dev_close() them with the wiphy mutex held, which goes back into the netdev notifier in cfg80211 and tries to acquire the wiphy mutex there. To fix this, we need to do two things: 1) prevent changing iftype while AP_VLANs are up, we can't easily fix this case since cfg80211 already calls us with the wiphy mutex held, but change_interface() is relatively rare in drivers anyway, so changing iftype isn't used much (and userspace has to fall back to down/change/up anyway) 2) pull the dev_close() loop over VLANs out of the wiphy mutex section in the normal stop case
In the Linux kernel, the following vulnerability has been resolved: usb: cdnsp: Fix deadlock issue in cdnsp_thread_irq_handler Patch fixes the following critical issue caused by deadlock which has been detected during testing NCM class: smp: csd: Detected non-responsive CSD lock (#1) on CPU#0 smp: csd: CSD lock (#1) unresponsive. .... RIP: 0010:native_queued_spin_lock_slowpath+0x61/0x1d0 RSP: 0018:ffffbc494011cde0 EFLAGS: 00000002 RAX: 0000000000000101 RBX: ffff9ee8116b4a68 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ee8116b4658 RBP: ffffbc494011cde0 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658 R13: ffff9ee8116b4670 R14: 0000000000000246 R15: ffff9ee8116b4658 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7bcc41a830 CR3: 000000007a612003 CR4: 00000000001706e0 Call Trace: <IRQ> do_raw_spin_lock+0xc0/0xd0 _raw_spin_lock_irqsave+0x95/0xa0 cdnsp_gadget_ep_queue.cold+0x88/0x107 [cdnsp_udc_pci] usb_ep_queue+0x35/0x110 eth_start_xmit+0x220/0x3d0 [u_ether] ncm_tx_timeout+0x34/0x40 [usb_f_ncm] ? ncm_free_inst+0x50/0x50 [usb_f_ncm] __hrtimer_run_queues+0xac/0x440 hrtimer_run_softirq+0x8c/0xb0 __do_softirq+0xcf/0x428 asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x61/0x70 irq_exit_rcu+0xc1/0xd0 sysvec_apic_timer_interrupt+0x52/0xb0 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:do_raw_spin_trylock+0x18/0x40 RSP: 0018:ffffbc494138bda8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff9ee8116b4658 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ee8116b4658 RBP: ffffbc494138bda8 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658 R13: ffff9ee8116b4670 R14: ffff9ee7b5c73d80 R15: ffff9ee8116b4000 _raw_spin_lock+0x3d/0x70 ? cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci] cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci] ? cdnsp_remove_request+0x1f0/0x1f0 [cdnsp_udc_pci] ? cdnsp_thread_irq_handler+0x5/0xa0 [cdnsp_udc_pci] ? irq_thread+0xa0/0x1c0 irq_thread_fn+0x28/0x60 irq_thread+0x105/0x1c0 ? __kthread_parkme+0x42/0x90 ? irq_forced_thread_fn+0x90/0x90 ? wake_threads_waitq+0x30/0x30 ? irq_thread_check_affinity+0xe0/0xe0 kthread+0x12a/0x160 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 The root cause of issue is spin_lock/spin_unlock instruction instead spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler function.
In the Linux kernel, the following vulnerability has been resolved: nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert After commit 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()"), the call to get_user_pages() will trigger the mmap assert. static inline void mmap_assert_locked(struct mm_struct *mm) { lockdep_assert_held(&mm->mmap_lock); VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } [ 62.521410] kernel BUG at include/linux/mmap_lock.h:156! ........................................................... [ 62.538938] RIP: 0010:find_vma+0x32/0x80 ........................................................... [ 62.605889] Call Trace: [ 62.608502] <TASK> [ 62.610956] ? lock_timer_base+0x61/0x80 [ 62.614106] find_extend_vma+0x19/0x80 [ 62.617195] __get_user_pages+0x9b/0x6a0 [ 62.620356] __gup_longterm_locked+0x42d/0x450 [ 62.623721] ? finish_wait+0x41/0x80 [ 62.626748] ? __kmalloc+0x178/0x2f0 [ 62.629768] ne_set_user_memory_region_ioctl.isra.0+0x225/0x6a0 [nitro_enclaves] [ 62.635776] ne_enclave_ioctl+0x1cf/0x6d7 [nitro_enclaves] [ 62.639541] __x64_sys_ioctl+0x82/0xb0 [ 62.642620] do_syscall_64+0x3b/0x90 [ 62.645642] entry_SYSCALL_64_after_hwframe+0x44/0xae Use get_user_pages_unlocked() when setting the enclave memory regions. That's a similar pattern as mmap_read_lock() used together with get_user_pages().
In the Linux kernel, the following vulnerability has been resolved: tipc: wait and exit until all work queues are done On some host, a crash could be triggered simply by repeating these commands several times: # modprobe tipc # tipc bearer enable media udp name UDP1 localip 127.0.0.1 # rmmod tipc [] BUG: unable to handle kernel paging request at ffffffffc096bb00 [] Workqueue: events 0xffffffffc096bb00 [] Call Trace: [] ? process_one_work+0x1a7/0x360 [] ? worker_thread+0x30/0x390 [] ? create_worker+0x1a0/0x1a0 [] ? kthread+0x116/0x130 [] ? kthread_flush_work_fn+0x10/0x10 [] ? ret_from_fork+0x35/0x40 When removing the TIPC module, the UDP tunnel sock will be delayed to release in a work queue as sock_release() can't be done in rtnl_lock(). If the work queue is schedule to run after the TIPC module is removed, kernel will crash as the work queue function cleanup_beareri() code no longer exists when trying to invoke it. To fix it, this patch introduce a member wq_count in tipc_net to track the numbers of work queues in schedule, and wait and exit until all work queues are done in tipc_exit_net().
In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen.
In the Linux kernel, the following vulnerability has been resolved: cifs: Fix soft lockup during fsstress Below traces are observed during fsstress and system got hung. [ 130.698396] watchdog: BUG: soft lockup - CPU#6 stuck for 26s!
In the Linux kernel, the following vulnerability has been resolved: iio: adis16475: fix deadlock on frequency set With commit 39c024b51b560 ("iio: adis16475: improve sync scale mode handling"), two deadlocks were introduced: 1) The call to 'adis_write_reg_16()' was not changed to it's unlocked version. 2) The lock was not being released on the success path of the function. This change fixes both these issues.
The handle_stop_signal function in signal.c in Linux kernel 2.6.11 up to other versions before 2.6.13 and 2.6.12.6 allows local users to cause a denial of service (deadlock) by sending a SIGKILL to a real-time threaded process while it is performing a core dump.
In the Linux kernel, the following vulnerability has been resolved: netrom: fix possible dead-lock in nr_rt_ioctl() syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1] Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node) [1] WARNING: possible circular locking dependency detected 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted ------------------------------------------------------ syz-executor350/5129 is trying to acquire lock: ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 but task is already holding lock: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (nr_node_list_lock){+...}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_remove_node net/netrom/nr_route.c:299 [inline] nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355 nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&nr_node->node_lock){+...}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nr_node_list_lock); lock(&nr_node->node_lock); lock(nr_node_list_lock); lock(&nr_node->node_lock); *** DEADLOCK *** 1 lock held by syz-executor350/5129: #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] #0: ffffffff8f70 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: block: fix deadlock between sd_remove & sd_release Our test report the following hung task: [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds. [ 2538.459427] Call trace: [ 2538.459430] __switch_to+0x174/0x338 [ 2538.459436] __schedule+0x628/0x9c4 [ 2538.459442] schedule+0x7c/0xe8 [ 2538.459447] schedule_preempt_disabled+0x24/0x40 [ 2538.459453] __mutex_lock+0x3ec/0xf04 [ 2538.459456] __mutex_lock_slowpath+0x14/0x24 [ 2538.459459] mutex_lock+0x30/0xd8 [ 2538.459462] del_gendisk+0xdc/0x350 [ 2538.459466] sd_remove+0x30/0x60 [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4 [ 2538.459474] device_release_driver+0x18/0x28 [ 2538.459478] bus_remove_device+0x15c/0x174 [ 2538.459483] device_del+0x1d0/0x358 [ 2538.459488] __scsi_remove_device+0xa8/0x198 [ 2538.459493] scsi_forget_host+0x50/0x70 [ 2538.459497] scsi_remove_host+0x80/0x180 [ 2538.459502] usb_stor_disconnect+0x68/0xf4 [ 2538.459506] usb_unbind_interface+0xd4/0x280 [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4 [ 2538.459514] device_release_driver+0x18/0x28 [ 2538.459518] bus_remove_device+0x15c/0x174 [ 2538.459523] device_del+0x1d0/0x358 [ 2538.459528] usb_disable_device+0x84/0x194 [ 2538.459532] usb_disconnect+0xec/0x300 [ 2538.459537] hub_event+0xb80/0x1870 [ 2538.459541] process_scheduled_works+0x248/0x4dc [ 2538.459545] worker_thread+0x244/0x334 [ 2538.459549] kthread+0x114/0x1bc [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds. [ 2538.461014] Call trace: [ 2538.461016] __switch_to+0x174/0x338 [ 2538.461021] __schedule+0x628/0x9c4 [ 2538.461025] schedule+0x7c/0xe8 [ 2538.461030] blk_queue_enter+0xc4/0x160 [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4 [ 2538.461037] scsi_execute_cmd+0x7c/0x23c [ 2538.461040] ioctl_internal_command+0x5c/0x164 [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0 [ 2538.461051] sd_release+0x50/0x94 [ 2538.461054] blkdev_put+0x190/0x28c [ 2538.461058] blkdev_release+0x28/0x40 [ 2538.461063] __fput+0xf8/0x2a8 [ 2538.461066] __fput_sync+0x28/0x5c [ 2538.461070] __arm64_sys_close+0x84/0xe8 [ 2538.461073] invoke_syscall+0x58/0x114 [ 2538.461078] el0_svc_common+0xac/0xe0 [ 2538.461082] do_el0_svc+0x1c/0x28 [ 2538.461087] el0_svc+0x38/0x68 [ 2538.461090] el0t_64_sync_handler+0x68/0xbc [ 2538.461093] el0t_64_sync+0x1a8/0x1ac T1: T2: sd_remove del_gendisk __blk_mark_disk_dead blk_freeze_queue_start ++q->mq_freeze_depth bdev_release mutex_lock(&disk->open_mutex) sd_release scsi_execute_cmd blk_queue_enter wait_event(!q->mq_freeze_depth) mutex_lock(&disk->open_mutex) SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in this scenario. This is a classic ABBA deadlock. To fix the deadlock, make sure we don't try to acquire disk->open_mutex after freezing the queue.
In the Linux kernel, the following vulnerability has been resolved: i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr When del_timer_sync() is called in an interrupt context it throws a warning because of potential deadlock. The timer is used only to exit from wait_for_completion() after a timeout so replacing the call with wait_for_completion_timeout() allows to remove the problematic timer and its related functions altogether.
In the Linux kernel, the following vulnerability has been resolved: i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock A deadlock may happen since the i3c_master_register() acquires &i3cbus->lock twice. See the log below. Use i3cdev->desc->info instead of calling i3c_device_info() to avoid acquiring the lock twice. v2: - Modified the title and commit message ============================================ WARNING: possible recursive locking detected 6.11.0-mainline -------------------------------------------- init/1 is trying to acquire lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_bus_normaluse_lock but task is already holding lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&i3cbus->lock); lock(&i3cbus->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by init/1: #0: fcffff809b6798f8 (&dev->mutex){....}-{3:3}, at: __driver_attach #1: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register stack backtrace: CPU: 6 UID: 0 PID: 1 Comm: init Call trace: dump_backtrace+0xfc/0x17c show_stack+0x18/0x28 dump_stack_lvl+0x40/0xc0 dump_stack+0x18/0x24 print_deadlock_bug+0x388/0x390 __lock_acquire+0x18bc/0x32ec lock_acquire+0x134/0x2b0 down_read+0x50/0x19c i3c_bus_normaluse_lock+0x14/0x24 i3c_device_get_info+0x24/0x58 i3c_device_uevent+0x34/0xa4 dev_uevent+0x310/0x384 kobject_uevent_env+0x244/0x414 kobject_uevent+0x14/0x20 device_add+0x278/0x460 device_register+0x20/0x34 i3c_master_register_new_i3c_devs+0x78/0x154 i3c_master_register+0x6a0/0x6d4 mtk_i3c_master_probe+0x3b8/0x4d8 platform_probe+0xa0/0xe0 really_probe+0x114/0x454 __driver_probe_device+0xa0/0x15c driver_probe_device+0x3c/0x1ac __driver_attach+0xc4/0x1f0 bus_for_each_dev+0x104/0x160 driver_attach+0x24/0x34 bus_add_driver+0x14c/0x294 driver_register+0x68/0x104 __platform_driver_register+0x20/0x30 init_module+0x20/0xfe4 do_one_initcall+0x184/0x464 do_init_module+0x58/0x1ec load_module+0xefc/0x10c8 __arm64_sys_finit_module+0x238/0x33c invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x50/0xac el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac
A denial of service vulnerability was found in tipc_crypto_key_revoke in net/tipc/crypto.c in the Linux kernel’s TIPC subsystem. This flaw allows guests with local user privileges to trigger a deadlock and potentially crash the system.
A denial of service vulnerability due to a deadlock was found in sctp_auto_asconf_init in net/sctp/socket.c in the Linux kernel’s SCTP subsystem. This flaw allows guests with local user privileges to trigger a deadlock and potentially crash the system.
In the Linux kernel, the following vulnerability has been resolved: cgroup/cpuset: remove kernfs active break A warning was found: WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828 CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0 RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202 RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000 RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04 RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180 R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08 R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0 FS: 00007f84cd0be740(0000) GS:ffff8887ddc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: kernfs_drain+0x15e/0x2f0 __kernfs_remove+0x165/0x300 kernfs_remove_by_name_ns+0x7b/0xc0 cgroup_rm_file+0x154/0x1c0 cgroup_addrm_files+0x1c2/0x1f0 css_clear_dir+0x77/0x110 kill_css+0x4c/0x1b0 cgroup_destroy_locked+0x194/0x380 cgroup_rmdir+0x2a/0x140 It can be explained by: rmdir echo 1 > cpuset.cpus kernfs_fop_write_iter // active=0 cgroup_rm_file kernfs_remove_by_name_ns kernfs_get_active // active=1 __kernfs_remove // active=0x80000002 kernfs_drain cpuset_write_resmask wait_event //waiting (active == 0x80000001) kernfs_break_active_protection // active = 0x80000001 // continue kernfs_unbreak_active_protection // active = 0x80000002 ... kernfs_should_drain_open_files // warning occurs kernfs_put_active This warning is caused by 'kernfs_break_active_protection' when it is writing to cpuset.cpus, and the cgroup is removed concurrently. The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside get_online_cpus()") made cpuset_hotplug_workfn asynchronous, This change involves calling flush_work(), which can create a multiple processes circular locking dependency that involve cgroup_mutex, potentially leading to a deadlock. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()") added 'kernfs_break_active_protection' in the cpuset_write_resmask. This could lead to this warning. After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug processing synchronous"), the cpuset_write_resmask no longer needs to wait the hotplug to finish, which means that concurrent hotplug and cpuset operations are no longer possible. Therefore, the deadlock doesn't exist anymore and it does not have to 'break active protection' now. To fix this warning, just remove kernfs_break_active_protection operation in the 'cpuset_write_resmask'.
In the Linux kernel, the following vulnerability has been resolved: iavf: Fix reset error handling Do not call iavf_close in iavf_reset_task error handling. Doing so can lead to double call of napi_disable, which can lead to deadlock there. Removing VF would lead to iavf_remove task being stuck, because it requires crit_lock, which is held by iavf_close. Call iavf_disable_vf if reset fail, so that driver will clean up remaining invalid resources. During rapid VF resets, HW can fail to setup VF mailbox. Wrong error handling can lead to iavf_remove being stuck with: [ 5218.999087] iavf 0000:82:01.0: Failed to init adminq: -53 ... [ 5267.189211] INFO: task repro.sh:11219 blocked for more than 30 seconds. [ 5267.189520] Tainted: G S E 5.18.0-04958-ga54ce3703613-dirty #1 [ 5267.189764] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5267.190062] task:repro.sh state:D stack: 0 pid:11219 ppid: 8162 flags:0x00000000 [ 5267.190347] Call Trace: [ 5267.190647] <TASK> [ 5267.190927] __schedule+0x460/0x9f0 [ 5267.191264] schedule+0x44/0xb0 [ 5267.191563] schedule_preempt_disabled+0x14/0x20 [ 5267.191890] __mutex_lock.isra.12+0x6e3/0xac0 [ 5267.192237] ? iavf_remove+0xf9/0x6c0 [iavf] [ 5267.192565] iavf_remove+0x12a/0x6c0 [iavf] [ 5267.192911] ? _raw_spin_unlock_irqrestore+0x1e/0x40 [ 5267.193285] pci_device_remove+0x36/0xb0 [ 5267.193619] device_release_driver_internal+0xc1/0x150 [ 5267.193974] pci_stop_bus_device+0x69/0x90 [ 5267.194361] pci_stop_and_remove_bus_device+0xe/0x20 [ 5267.194735] pci_iov_remove_virtfn+0xba/0x120 [ 5267.195130] sriov_disable+0x2f/0xe0 [ 5267.195506] ice_free_vfs+0x7d/0x2f0 [ice] [ 5267.196056] ? pci_get_device+0x4f/0x70 [ 5267.196496] ice_sriov_configure+0x78/0x1a0 [ice] [ 5267.196995] sriov_numvfs_store+0xfe/0x140 [ 5267.197466] kernfs_fop_write_iter+0x12e/0x1c0 [ 5267.197918] new_sync_write+0x10c/0x190 [ 5267.198404] vfs_write+0x24e/0x2d0 [ 5267.198886] ksys_write+0x5c/0xd0 [ 5267.199367] do_syscall_64+0x3a/0x80 [ 5267.199827] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 5267.200317] RIP: 0033:0x7f5b381205c8 [ 5267.200814] RSP: 002b:00007fff8c7e8c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 5267.201981] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5b381205c8 [ 5267.202620] RDX: 0000000000000002 RSI: 00005569420ee900 RDI: 0000000000000001 [ 5267.203426] RBP: 00005569420ee900 R08: 000000000000000a R09: 00007f5b38180820 [ 5267.204327] R10: 000000000000000a R11: 0000000000000246 R12: 00007f5b383c06e0 [ 5267.205193] R13: 0000000000000002 R14: 00007f5b383bb880 R15: 0000000000000002 [ 5267.206041] </TASK> [ 5267.206970] Kernel panic - not syncing: hung_task: blocked tasks [ 5267.207809] CPU: 48 PID: 551 Comm: khungtaskd Kdump: loaded Tainted: G S E 5.18.0-04958-ga54ce3703613-dirty #1 [ 5267.208726] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.11.0 11/02/2019 [ 5267.209623] Call Trace: [ 5267.210569] <TASK> [ 5267.211480] dump_stack_lvl+0x33/0x42 [ 5267.212472] panic+0x107/0x294 [ 5267.213467] watchdog.cold.8+0xc/0xbb [ 5267.214413] ? proc_dohung_task_timeout_secs+0x30/0x30 [ 5267.215511] kthread+0xf4/0x120 [ 5267.216459] ? kthread_complete_and_exit+0x20/0x20 [ 5267.217505] ret_from_fork+0x22/0x30 [ 5267.218459] </TASK>
In the Linux kernel, the following vulnerability has been resolved: drivers: staging: rtl8723bs: Fix locking in _rtw_join_timeout_handler() Commit 041879b12ddb ("drivers: staging: rtl8192bs: Fix deadlock in rtw_joinbss_event_prehandle()") besides fixing the deadlock also modified _rtw_join_timeout_handler() to use spin_[un]lock_irq() instead of spin_[un]lock_bh(). _rtw_join_timeout_handler() calls rtw_do_join() which takes pmlmepriv->scanned_queue.lock using spin_[un]lock_bh(). This spin_unlock_bh() call re-enables softirqs which triggers an oops in kernel/softirq.c: __local_bh_enable_ip() when it calls lockdep_assert_irqs_enabled(): [ 244.506087] WARNING: CPU: 2 PID: 0 at kernel/softirq.c:376 __local_bh_enable_ip+0xa6/0x100 ... [ 244.509022] Call Trace: [ 244.509048] <IRQ> [ 244.509100] _rtw_join_timeout_handler+0x134/0x170 [r8723bs] [ 244.509468] ? __pfx__rtw_join_timeout_handler+0x10/0x10 [r8723bs] [ 244.509772] ? __pfx__rtw_join_timeout_handler+0x10/0x10 [r8723bs] [ 244.510076] call_timer_fn+0x95/0x2a0 [ 244.510200] __run_timers.part.0+0x1da/0x2d0 This oops is causd by the switch to spin_[un]lock_irq() which disables the IRQs for the entire duration of _rtw_join_timeout_handler(). Disabling the IRQs is not necessary since all code taking this lock runs from either user contexts or from softirqs, switch back to spin_[un]lock_bh() to fix this.
In the Linux kernel, the following vulnerability has been resolved: igb: revert rtnl_lock() that causes deadlock The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock to eliminate a false data race shown below (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] The above race will never happen and the extra rtnl_lock causes deadlock below [ 141.420169] <TASK> [ 141.420672] __schedule+0x2dd/0x840 [ 141.421427] schedule+0x50/0xc0 [ 141.422041] schedule_preempt_disabled+0x11/0x20 [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 [ 141.423324] unregister_netdev+0xe/0x20 [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] [ 141.423791] pci_device_remove+0x36/0xb0 [ 141.423990] device_release_driver_internal+0xc1/0x160 [ 141.424270] pci_stop_bus_device+0x6d/0x90 [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 [ 141.425452] sriov_disable+0x2f/0xf0 [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] [ 141.426353] igb_remove+0xa0/0x130 [igb] [ 141.426599] pci_device_remove+0x36/0xb0 [ 141.426796] device_release_driver_internal+0xc1/0x160 [ 141.427060] driver_detach+0x44/0x90 [ 141.427253] bus_remove_driver+0x55/0xe0 [ 141.427477] pci_unregister_driver+0x2a/0xa0 [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 [ 141.429126] ? mntput_no_expire+0x4a/0x240 [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 [ 141.429653] do_syscall_64+0x5b/0x80 [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.430849] ? do_syscall_64+0x67/0x80 [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.432482] ? do_syscall_64+0x67/0x80 [ 141.432714] ? exc_page_fault+0x64/0x140 [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc Since the igb_disable_sriov() will call pci_disable_sriov() before releasing any resources, the netdev core will synchronize the cleanup to avoid any races. This patch removes the useless rtnl_(un)lock to guarantee correctness.
In the Linux kernel, the following vulnerability has been resolved: media: v4l2-mem2mem: add lock to protect parameter num_rdy Getting below error when using KCSAN to check the driver. Adding lock to protect parameter num_rdy when getting the value with function: v4l2_m2m_num_src_bufs_ready/v4l2_m2m_num_dst_bufs_ready. kworker/u16:3: [name:report&]BUG: KCSAN: data-race in v4l2_m2m_buf_queue kworker/u16:3: [name:report&] kworker/u16:3: [name:report&]read-write to 0xffffff8105f35b94 of 1 bytes by task 20865 on cpu 7: kworker/u16:3: v4l2_m2m_buf_queue+0xd8/0x10c
In the Linux kernel, the following vulnerability has been resolved: mm/swapfile: add cond_resched() in get_swap_pages() The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently. The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup.
In the Linux kernel, the following vulnerability has been resolved: ext4: avoid deadlock in fs reclaim with page writeback Ext4 has a filesystem wide lock protecting ext4_writepages() calls to avoid races with switching of journalled data flag or inode format. This lock can however cause a deadlock like: CPU0 CPU1 ext4_writepages() percpu_down_read(sbi->s_writepages_rwsem); ext4_change_inode_journal_flag() percpu_down_write(sbi->s_writepages_rwsem); - blocks, all readers block from now on ext4_do_writepages() ext4_init_io_end() kmem_cache_zalloc(io_end_cachep, GFP_KERNEL) fs_reclaim frees dentry... dentry_unlink_inode() iput() - last ref => iput_final() - inode dirty => write_inode_now()... ext4_writepages() tries to acquire sbi->s_writepages_rwsem and blocks forever Make sure we cannot recurse into filesystem reclaim from writeback code to avoid the deadlock.
In the Linux kernel, the following vulnerability has been resolved: netfilter: ipset: Rework long task execution when adding/deleting entries When adding/deleting large number of elements in one step in ipset, it can take a reasonable amount of time and can result in soft lockup errors. The patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") tried to fix it by limiting the max elements to process at all. However it was not enough, it is still possible that we get hung tasks. Lowering the limit is not reasonable, so the approach in this patch is as follows: rely on the method used at resizing sets and save the state when we reach a smaller internal batch limit, unlock/lock and proceed from the saved state. Thus we can avoid long continuous tasks and at the same time removed the limit to add/delete large number of elements in one step. The nfnl mutex is held during the whole operation which prevents one to issue other ipset commands in parallel.
In the Linux kernel, the following vulnerability has been resolved: Bluetooth: Fix possible deadlock in rfcomm_sk_state_change syzbot reports a possible deadlock in rfcomm_sk_state_change [1]. While rfcomm_sock_connect acquires the sk lock and waits for the rfcomm lock, rfcomm_sock_release could have the rfcomm lock and hit a deadlock for acquiring the sk lock. Here's a simplified flow: rfcomm_sock_connect: lock_sock(sk) rfcomm_dlc_open: rfcomm_lock() rfcomm_sock_release: rfcomm_sock_shutdown: rfcomm_lock() __rfcomm_dlc_close: rfcomm_k_state_change: lock_sock(sk) This patch drops the sk lock before calling rfcomm_dlc_open to avoid the possible deadlock and holds sk's reference count to prevent use-after-free after rfcomm_dlc_open completes.
In the Linux kernel, the following vulnerability has been resolved: io_uring: lock overflowing for IOPOLL syzbot reports an issue with overflow filling for IOPOLL: WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Workqueue: events_unbound io_ring_exit_work Call trace: io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773 io_fill_cqe_req io_uring/io_uring.h:168 [inline] io_do_iopoll+0x474/0x62c io_uring/rw.c:1065 io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513 io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056 io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869 process_one_work+0x2d8/0x504 kernel/workqueue.c:2289 worker_thread+0x340/0x610 kernel/workqueue.c:2436 kthread+0x12c/0x158 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863 There is no real problem for normal IOPOLL as flush is also called with uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL, for which __io_cqring_overflow_flush() happens from the CQ waiting path.
In the Linux kernel, the following vulnerability has been resolved: octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt The commit 4af1b64f80fb ("octeontx2-pf: Fix lmtst ID used in aura free") uses the get/put_cpu() to protect the usage of percpu pointer in ->aura_freeptr() callback, but it also unnecessarily disable the preemption for the blockable memory allocation. The commit 87b93b678e95 ("octeontx2-pf: Avoid use of GFP_KERNEL in atomic context") tried to fix these sleep inside atomic warnings. But it only fix the one for the non-rt kernel. For the rt kernel, we still get the similar warnings like below. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by swapper/0/1: #0: ffff800009fc5fe8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30 #1: ffff000100c276c0 (&mbox->lock){+.+.}-{3:3}, at: otx2_init_hw_resources+0x8c/0x3a4 #2: ffffffbfef6537e0 (&cpu_rcache->lock){+.+.}-{2:2}, at: alloc_iova_fast+0x1ac/0x2ac Preemption disabled at: [<ffff800008b1908c>] otx2_rq_aura_pool_init+0x14c/0x284 CPU: 20 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc3-rt1-yocto-preempt-rt #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace.part.0+0xe8/0xf4 show_stack+0x20/0x30 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x224 rt_spin_lock+0x64/0x110 alloc_iova_fast+0x1ac/0x2ac iommu_dma_alloc_iova+0xd4/0x110 __iommu_dma_map+0x80/0x144 iommu_dma_map_page+0xe8/0x260 dma_map_page_attrs+0xb4/0xc0 __otx2_alloc_rbuf+0x90/0x150 otx2_rq_aura_pool_init+0x1c8/0x284 otx2_init_hw_resources+0xe4/0x3a4 otx2_open+0xf0/0x610 __dev_open+0x104/0x224 __dev_change_flags+0x1e4/0x274 dev_change_flags+0x2c/0x7c ic_open_devs+0x124/0x2f8 ip_auto_config+0x180/0x42c do_one_initcall+0x90/0x4dc do_basic_setup+0x10c/0x14c kernel_init_freeable+0x10c/0x13c kernel_init+0x2c/0x140 ret_from_fork+0x10/0x20 Of course, we can shuffle the get/put_cpu() to only wrap the invocation of ->aura_freeptr() as what commit 87b93b678e95 does. But there are only two ->aura_freeptr() callbacks, otx2_aura_freeptr() and cn10k_aura_freeptr(). There is no usage of perpcu variable in the otx2_aura_freeptr() at all, so the get/put_cpu() seems redundant to it. We can move the get/put_cpu() into the corresponding callback which really has the percpu variable usage and avoid the sprinkling of get/put_cpu() in several places.
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix deadlock when aborting transaction during relocation with scrub Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requires starting and committing a transaction, and if we have a failure in the critical section of the transaction commit path (transaction state >= TRANS_STATE_COMMIT_START), we will deadlock if there is a paused scrub. That results in stack traces like the following: [42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6 [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction. [42.936] ------------[ cut here ]------------ [42.936] BTRFS: Transaction aborted (error -28) [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...) [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Code: ff ff 45 8b (...) [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282 [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000 [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8 [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00 [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0 [42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000 [42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0 [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42.936] Call Trace: [42.936] <TASK> [42.936] ? start_transaction+0xcb/0x610 [btrfs] [42.936] prepare_to_relocate+0x111/0x1a0 [btrfs] [42.936] relocate_block_group+0x57/0x5d0 [btrfs] [42.936] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [42.936] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [42.936] ? __pfx_autoremove_wake_function+0x10/0x10 [42.936] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [42.936] btrfs_balance+0x8ff/0x11d0 [btrfs] [42.936] ? __kmem_cache_alloc_node+0x14a/0x410 [42.936] btrfs_ioctl+0x2334/0x32c0 [btrfs] [42.937] ? mod_objcg_state+0xd2/0x360 [42.937] ? refill_obj_stock+0xb0/0x160 [42.937] ? seq_release+0x25/0x30 [42.937] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [42.937] ? percpu_counter_add_batch+0x2e/0xa0 [42.937] ? __x64_sys_ioctl+0x88/0xc0 [42.937] __x64_sys_ioctl+0x88/0xc0 [42.937] do_syscall_64+0x38/0x90 [42.937] entry_SYSCALL_64_after_hwframe+0x72/0xdc [42.937] RIP: 0033:0x7f381a6ffe9b [42.937] Code: 00 48 89 44 24 (...) [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [42.937] </TASK> [42.937] ---[ end trace 0000000000000000 ]--- [42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds. [59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.196] "echo 0 > /proc/sys/kernel/hung_ ---truncated---
In the Linux kernel, the following vulnerability has been resolved: usb: dwc3: core: remove lock of otg mode during gadget suspend/resume to avoid deadlock When config CONFIG_USB_DWC3_DUAL_ROLE is selected, and trigger system to enter suspend status with below command: echo mem > /sys/power/state There will be a deadlock issue occurring. Detailed invoking path as below: dwc3_suspend_common() spin_lock_irqsave(&dwc->lock, flags); <-- 1st dwc3_gadget_suspend(dwc); dwc3_gadget_soft_disconnect(dwc); spin_lock_irqsave(&dwc->lock, flags); <-- 2nd This issue is exposed by commit c7ebd8149ee5 ("usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend") that removes the code of checking whether dwc->gadget_driver is NULL or not. It causes the following code is executed and deadlock occurs when trying to get the spinlock. In fact, the root cause is the commit 5265397f9442("usb: dwc3: Remove DWC3 locking during gadget suspend/resume") that forgot to remove the lock of otg mode. So, remove the redundant lock of otg mode during gadget suspend/resume.
In the Linux kernel, the following vulnerability has been resolved: net/smc: fix deadlock triggered by cancel_delayed_work_syn() The following LOCKDEP was detected: Workqueue: events smc_lgr_free_work [smc] WARNING: possible circular locking dependency detected 6.1.0-20221027.rc2.git8.56bc5b569087.300.fc36.s390x+debug #1 Not tainted ------------------------------------------------------ kworker/3:0/176251 is trying to acquire lock: 00000000f1467148 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}, at: __flush_workqueue+0x7a/0x4f0 but task is already holding lock: 0000037fffe97dc8 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x232/0x730 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __flush_work+0x76/0xf0 __cancel_work_timer+0x170/0x220 __smc_lgr_terminate.part.0+0x34/0x1c0 [smc] smc_connect_rdma+0x15e/0x418 [smc] __smc_connect+0x234/0x480 [smc] smc_connect+0x1d6/0x230 [smc] __sys_connect+0x90/0xc0 __do_sys_socketcall+0x186/0x370 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #3 (smc_client_lgr_pending){+.+.}-{3:3}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __mutex_lock+0x96/0x8e8 mutex_lock_nested+0x32/0x40 smc_connect_rdma+0xa4/0x418 [smc] __smc_connect+0x234/0x480 [smc] smc_connect+0x1d6/0x230 [smc] __sys_connect+0x90/0xc0 __do_sys_socketcall+0x186/0x370 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #2 (sk_lock-AF_SMC){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 lock_sock_nested+0x46/0xa8 smc_tx_work+0x34/0x50 [smc] process_one_work+0x30c/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 -> #1 ((work_completion)(&(&smc->conn.tx_work)->work)){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 process_one_work+0x2bc/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 -> #0 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}: check_prev_add+0xd8/0xe88 validate_chain+0x70c/0xb20 __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __flush_workqueue+0xaa/0x4f0 drain_workqueue+0xaa/0x158 destroy_workqueue+0x44/0x2d8 smc_lgr_free+0x9e/0xf8 [smc] process_one_work+0x30c/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 other info that might help us debug this: Chain exists of: (wq_completion)smc_tx_wq-00000000#2 --> smc_client_lgr_pending --> (work_completion)(&(&lgr->free_work)->work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&lgr->free_work)->work)); lock(smc_client_lgr_pending); lock((work_completion) (&(&lgr->free_work)->work)); lock((wq_completion)smc_tx_wq-00000000#2); *** DEADLOCK *** 2 locks held by kworker/3:0/176251: #0: 0000000080183548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x232/0x730 #1: 0000037fffe97dc8 ((work_completion) (&(&lgr->free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x232/0x730 stack backtr ---truncated---
In the Linux kernel, the following vulnerability has been resolved: net/mlx5e: Fix deadlock in tc route query code Cited commit causes ABBA deadlock[0] when peer flows are created while holding the devcom rw semaphore. Due to peer flows offload implementation the lock is taken much higher up the call chain and there is no obvious way to easily fix the deadlock. Instead, since tc route query code needs the peer eswitch structure only to perform a lookup in xarray and doesn't perform any sleeping operations with it, refactor the code for lockless execution in following ways: - RCUify the devcom 'data' pointer. When resetting the pointer synchronously wait for RCU grace period before returning. This is fine since devcom is currently only used for synchronization of pairing/unpairing of eswitches which is rare and already expensive as-is. - Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has already been used in some unlocked contexts without proper annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't an issue since all relevant code paths checked it again after obtaining the devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as "best effort" check to return NULL when devcom is being unpaired. Note that while RCU read lock doesn't prevent the unpaired flag from being changed concurrently it still guarantees that reader can continue to use 'data'. - Refactor mlx5e_tc_query_route_vport() function to use new mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock. [0]: [ 164.599612] ====================================================== [ 164.600142] WARNING: possible circular locking dependency detected [ 164.600667] 6.3.0-rc3+ #1 Not tainted [ 164.601021] ------------------------------------------------------ [ 164.601557] handler1/3456 is trying to acquire lock: [ 164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.603078] but task is already holding lock: [ 164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.604459] which lock already depends on the new lock. [ 164.605190] the existing dependency chain (in reverse order) is: [ 164.605848] -> #1 (&comp->sem){++++}-{3:3}: [ 164.606380] down_read+0x39/0x50 [ 164.606772] mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.607336] mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core] [ 164.607914] mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core] [ 164.608495] mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core] [ 164.609063] mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core] [ 164.609627] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.610175] mlx5e_configure_flower+0x952/0x1a20 [mlx5_core] [ 164.610741] tc_setup_cb_add+0xd4/0x200 [ 164.611146] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.611661] fl_change+0xc95/0x18a0 [cls_flower] [ 164.612116] tc_new_tfilter+0x3fc/0xd20 [ 164.612516] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.612936] netlink_rcv_skb+0x54/0x100 [ 164.613339] netlink_unicast+0x190/0x250 [ 164.613746] netlink_sendmsg+0x245/0x4a0 [ 164.614150] sock_sendmsg+0x38/0x60 [ 164.614522] ____sys_sendmsg+0x1d0/0x1e0 [ 164.614934] ___sys_sendmsg+0x80/0xc0 [ 164.615320] __sys_sendmsg+0x51/0x90 [ 164.615701] do_syscall_64+0x3d/0x90 [ 164.616083] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.616568] -> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}: [ 164.617210] __lock_acquire+0x159e/0x26e0 [ 164.617638] lock_acquire+0xc2/0x2a0 [ 164.618018] __mutex_lock+0x92/0xcd0 [ 164.618401] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.618943] post_process_attr+0x153/0x2d0 [ ---truncated---
In the Linux kernel, the following vulnerability has been resolved: md/raid10: prevent soft lockup while flush writes Currently, there is no limit for raid1/raid10 plugged bio. While flushing writes, raid1 has cond_resched() while raid10 doesn't, and too many writes can cause soft lockup. Follow up soft lockup can be triggered easily with writeback test for raid10 with ramdisks: watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293] Call Trace: <TASK> call_rcu+0x16/0x20 put_object+0x41/0x80 __delete_object+0x50/0x90 delete_object_full+0x2b/0x40 kmemleak_free+0x46/0xa0 slab_free_freelist_hook.constprop.0+0xed/0x1a0 kmem_cache_free+0xfd/0x300 mempool_free_slab+0x1f/0x30 mempool_free+0x3a/0x100 bio_free+0x59/0x80 bio_put+0xcf/0x2c0 free_r10bio+0xbf/0xf0 raid_end_bio_io+0x78/0xb0 one_write_done+0x8a/0xa0 raid10_end_write_request+0x1b4/0x430 bio_endio+0x175/0x320 brd_submit_bio+0x3b9/0x9b7 [brd] __submit_bio+0x69/0xe0 submit_bio_noacct_nocheck+0x1e6/0x5a0 submit_bio_noacct+0x38c/0x7e0 flush_pending_writes+0xf0/0x240 raid10d+0xac/0x1ed0 Fix the problem by adding cond_resched() to raid10 like what raid1 did. Note that unlimited plugged bio still need to be optimized, for example, in the case of lots of dirty pages writeback, this will take lots of memory and io will spend a long time in plug, hence io latency is bad.
In the Linux kernel, the following vulnerability has been resolved: sctp: add a refcnt in sctp_stream_priorities to avoid a nested loop With this refcnt added in sctp_stream_priorities, we don't need to traverse all streams to check if the prio is used by other streams when freeing one stream's prio in sctp_sched_prio_free_sid(). This can avoid a nested loop (up to 65535 * 65535), which may cause a stuck as Ying reported: watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [ksoftirqd/23:136] Call Trace: <TASK> sctp_sched_prio_free_sid+0xab/0x100 [sctp] sctp_stream_free_ext+0x64/0xa0 [sctp] sctp_stream_free+0x31/0x50 [sctp] sctp_association_free+0xa5/0x200 [sctp] Note that it doesn't need to use refcount_t type for this counter, as its accessing is always protected under the sock lock. v1->v2: - add a check in sctp_sched_prio_set to avoid the possible prio_head refcnt overflow.
In the Linux kernel, the following vulnerability has been resolved: octeontx2-pf: Avoid use of GFP_KERNEL in atomic context Using GFP_KERNEL in preemption disable context, causing below warning when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. [ 32.542271] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 [ 32.550883] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 [ 32.558707] preempt_count: 1, expected: 0 [ 32.562710] RCU nest depth: 0, expected: 0 [ 32.566800] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc2-00269-gae9dcb91c606 #7 [ 32.576188] Hardware name: Marvell CN106XX board (DT) [ 32.581232] Call trace: [ 32.583670] dump_backtrace.part.0+0xe0/0xf0 [ 32.587937] show_stack+0x18/0x30 [ 32.591245] dump_stack_lvl+0x68/0x84 [ 32.594900] dump_stack+0x18/0x34 [ 32.598206] __might_resched+0x12c/0x160 [ 32.602122] __might_sleep+0x48/0xa0 [ 32.605689] __kmem_cache_alloc_node+0x2b8/0x2e0 [ 32.610301] __kmalloc+0x58/0x190 [ 32.613610] otx2_sq_aura_pool_init+0x1a8/0x314 [ 32.618134] otx2_open+0x1d4/0x9d0 To avoid use of GFP_ATOMIC for memory allocation, disable preemption after all memory allocation is done.
In the Linux kernel, the following vulnerability has been resolved: ptdma: pt_core_execute_cmd() should use spinlock The interrupt handler (pt_core_irq_handler()) of the ptdma driver can be called from interrupt context. The code flow in this function can lead down to pt_core_execute_cmd() which will attempt to grab a mutex, which is not appropriate in interrupt context and ultimately leads to a kernel panic. The fix here changes this mutex to a spinlock, which has been verified to resolve the issue.
In the Linux kernel, the following vulnerability has been resolved: f2fs: fix to avoid potential deadlock in f2fs_record_stop_reason() syzbot reports deadlock issue of f2fs as below: ====================================================== WARNING: possible circular locking dependency detected 6.12.0-rc3-syzkaller-00087-gc964ced77262 #0 Not tainted ------------------------------------------------------ kswapd0/79 is trying to acquire lock: ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2199 [inline] ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068 but task is already holding lock: ffff88804bd92610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sb_internal#2){.+.+}-{0:0}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1716 [inline] sb_start_intwrite+0x4d/0x1c0 include/linux/fs.h:1899 f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842 evict+0x4e8/0x9b0 fs/inode.c:725 f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807 evict+0x4e8/0x9b0 fs/inode.c:725 dispose_list fs/inode.c:774 [inline] prune_icache_sb+0x239/0x2f0 fs/inode.c:963 super_cache_scan+0x38c/0x4b0 fs/super.c:223 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4818 shrink_many mm/vmscan.c:4879 [inline] lru_gen_shrink_node mm/vmscan.c:4957 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5937 kswapd_shrink_node mm/vmscan.c:6765 [inline] balance_pgdat mm/vmscan.c:6957 [inline] kswapd+0x1ca3/0x3700 mm/vmscan.c:7226 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 -> #1 (fs_reclaim){+.+.}-{0:0}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 __fs_reclaim_acquire mm/page_alloc.c:3834 [inline] fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3848 might_alloc include/linux/sched/mm.h:318 [inline] prepare_alloc_pages+0x147/0x5b0 mm/page_alloc.c:4493 __alloc_pages_noprof+0x16f/0x710 mm/page_alloc.c:4722 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_pages_noprof mm/mempolicy.c:2345 [inline] folio_alloc_noprof+0x128/0x180 mm/mempolicy.c:2352 filemap_alloc_folio_noprof+0xdf/0x500 mm/filemap.c:1010 do_read_cache_folio+0x2eb/0x850 mm/filemap.c:3787 read_mapping_folio include/linux/pagemap.h:1011 [inline] f2fs_commit_super+0x3c0/0x7d0 fs/f2fs/super.c:4032 f2fs_record_stop_reason+0x13b/0x1d0 fs/f2fs/super.c:4079 f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174 f2fs_write_inode+0x35f/0x4d0 fs/f2fs/inode.c:785 write_inode fs/fs-writeback.c:1503 [inline] __writeback_single_inode+0x711/0x10d0 fs/fs-writeback.c:1723 writeback_single_inode+0x1f3/0x660 fs/fs-writeback.c:1779 sync_inode_metadata+0xc4/0x120 fs/fs-writeback.c:2849 f2fs_release_file+0xa8/0x100 fs/f2fs/file.c:1941 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:228 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f ---truncated---
In the Linux kernel, the following vulnerability has been resolved: btrfs: zoned: fix lock ordering in btrfs_zone_activate() The btrfs CI reported a lockdep warning as follows by running generic generic/129. WARNING: possible circular locking dependency detected 6.7.0-rc5+ #1 Not tainted ------------------------------------------------------ kworker/u5:5/793427 is trying to acquire lock: ffff88813256d028 (&cache->lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x5e/0x130 but task is already holding lock: ffff88810a23a318 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x34/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}: ... -> #0 (&cache->lock){+.+.}-{2:2}: ... This is because we take fs_info->zone_active_bgs_lock after a block_group's lock in btrfs_zone_activate() while doing the opposite in other places. Fix the issue by expanding the fs_info->zone_active_bgs_lock's critical section and taking it before a block_group's lock.
In the Linux kernel, the following vulnerability has been resolved: drm: Don't unref the same fb many times by mistake due to deadlock handling If we get a deadlock after the fb lookup in drm_mode_page_flip_ioctl() we proceed to unref the fb and then retry the whole thing from the top. But we forget to reset the fb pointer back to NULL, and so if we then get another error during the retry, before the fb lookup, we proceed the unref the same fb again without having gotten another reference. The end result is that the fb will (eventually) end up being freed while it's still in use. Reset fb to NULL once we've unreffed it to avoid doing it again until we've done another fb lookup. This turned out to be pretty easy to hit on a DG2 when doing async flips (and CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y). The first symptom I saw that drm_closefb() simply got stuck in a busy loop while walking the framebuffer list. Fortunately I was able to convince it to oops instead, and from there it was easier to track down the culprit.
In the Linux kernel, the following vulnerability has been resolved: can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report: - j1939_socks_lock - active_session_list_lock - sk_session_queue_lock A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock. NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase. [mkl: remove unrelated newline change]
In the Linux kernel, the following vulnerability has been resolved: drm/amdkfd: Fix lock dependency warning with srcu ====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-yangp #2289 Not tainted ------------------------------------------------------ kworker/0:2/996 is trying to acquire lock: (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0 but task is already holding lock: ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at: process_one_work+0x211/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (&info->lock#2){+.+.}-{3:3}: __mutex_lock+0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restore_process_helper+0x22/0x80 [amdgpu] restore_process_worker+0x2d/0xa0 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 -> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2510 lock_sync+0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 other info that might help us debug this: Chain exists of: srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&info->lock#2); lock((work_completion)(&svms->deferred_list_work)); sync(srcu);