In the Linux kernel, the following vulnerability has been resolved: mptcp: fix possible deadlock in subflow diag Syzbot and Eric reported a lockdep splat in the subflow diag: WARNING: possible circular locking dependency detected 6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted syz-executor.2/24141 is trying to acquire lock: ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 but task is already holding lock: ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&h->lhash2[i].lock){+.+.}-{2:2}: lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743 inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261 __inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217 inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239 rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316 rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577 ops_init+0x352/0x610 net/core/net_namespace.c:136 __register_pernet_operations net/core/net_namespace.c:1214 [inline] register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283 register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370 rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735 do_one_initcall+0x238/0x830 init/main.c:1236 do_initcall_level+0x157/0x210 init/main.c:1298 do_initcalls+0x3f/0x80 init/main.c:1314 kernel_init_freeable+0x42f/0x5d0 init/main.c:1551 kernel_init+0x1d/0x2a0 init/main.c:1441 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242 -> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754 lock_sock_fast include/net/sock.h:1723 [inline] subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28 tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline] tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137 inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345 inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061 __inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263 inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371 netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264 __netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370 netlink_dump_start include/linux/netlink.h:338 [inline] inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405 sock_diag_rcv_msg+0xe7/0x410 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 As noted by Eric we can break the lock dependency chain avoid dumping ---truncated---
In the Linux kernel, the following vulnerability has been resolved: aoe: avoid potential deadlock at set_capacity Move set_capacity() outside of the section procected by (&d->lock). To avoid possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- [1] lock(&bdev->bd_size_lock); local_irq_disable(); [2] lock(&d->lock); [3] lock(&bdev->bd_size_lock); <Interrupt> [4] lock(&d->lock); *** DEADLOCK *** Where [1](&bdev->bd_size_lock) hold by zram_add()->set_capacity(). [2]lock(&d->lock) hold by aoeblk_gdalloc(). And aoeblk_gdalloc() is trying to acquire [3](&bdev->bd_size_lock) at set_capacity() call. In this situation an attempt to acquire [4]lock(&d->lock) from aoecmd_cfg_rsp() will lead to deadlock. So the simplest solution is breaking lock dependency [2](&d->lock) -> [3](&bdev->bd_size_lock) by moving set_capacity() outside.
Guests can trigger deadlock in Linux netback driver T[his CNA information record relates to multiple CVEs; the text explains which aspects/vulnerabilities correspond to which CVE.] The patch for XSA-392 introduced another issue which might result in a deadlock when trying to free the SKB of a packet dropped due to the XSA-392 handling (CVE-2022-42328). Additionally when dropping packages for other reasons the same deadlock could occur in case of netpoll being active for the interface the xen-netback driver is connected to (CVE-2022-42329).
A flaw was found in the Linux kernel's Layer 2 Tunneling Protocol (L2TP). A missing lock when clearing sk_user_data can lead to a race condition and NULL pointer dereference. A local user could use this flaw to potentially crash the system causing a denial of service.
In the Linux kernel, the following vulnerability has been resolved: KVM: arm64: Fix circular locking dependency The rule inside kvm enforces that the vcpu->mutex is taken *inside* kvm->lock. The rule is violated by the pkvm_create_hyp_vm() which acquires the kvm->lock while already holding the vcpu->mutex lock from kvm_vcpu_ioctl(). Avoid the circular locking dependency altogether by protecting the hyp vm handle with the config_lock, much like we already do for other forms of VM-scoped data.
A race condition flaw was found in the Linux kernel sound subsystem due to improper locking. It could lead to a NULL pointer dereference while handling the SNDCTL_DSP_SYNC ioctl. A privileged local user (root or member of the audio group) could use this flaw to crash the system, resulting in a denial of service condition
In the Linux kernel, the following vulnerability has been resolved: audit: improve robustness of the audit queue handling If the audit daemon were ever to get stuck in a stopped state the kernel's kauditd_thread() could get blocked attempting to send audit records to the userspace audit daemon. With the kernel thread blocked it is possible that the audit queue could grow unbounded as certain audit record generating events must be exempt from the queue limits else the system enter a deadlock state. This patch resolves this problem by lowering the kernel thread's socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks the kauditd_send_queue() function to better manage the various audit queues when connection problems occur between the kernel and the audit daemon. With this patch, the backlog may temporarily grow beyond the defined limits when the audit daemon is stopped and the system is under heavy audit pressure, but kauditd_thread() will continue to make progress and drain the queues as it would for other connection problems. For example, with the audit daemon put into a stopped state and the system configured to audit every syscall it was still possible to shutdown the system without a kernel panic, deadlock, etc.; granted, the system was slow to shutdown but that is to be expected given the extreme pressure of recording every syscall. The timeout value of HZ/10 was chosen primarily through experimentation and this developer's "gut feeling". There is likely no one perfect value, but as this scenario is limited in scope (root privileges would be needed to send SIGSTOP to the audit daemon), it is likely not worth exposing this as a tunable at present. This can always be done at a later date if it proves necessary.
In the Linux kernel, the following vulnerability has been resolved: mtd: require write permissions for locking and badblock ioctls MEMLOCK, MEMUNLOCK and OTPLOCK modify protection bits. Thus require write permission. Depending on the hardware MEMLOCK might even be write-once, e.g. for SPI-NOR flashes with their WP# tied to GND. OTPLOCK is always write-once. MEMSETBADBLOCK modifies the bad block table.
In the Linux kernel, the following vulnerability has been resolved: scsi: ufs: Fix a deadlock in the error handler The following deadlock has been observed on a test setup: - All tags allocated - The SCSI error handler calls ufshcd_eh_host_reset_handler() - ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() - ufshcd_err_handler() locks up as follows: Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace: __switch_to+0x298/0x5d8 __schedule+0x6cc/0xa94 schedule+0x12c/0x298 blk_mq_get_tag+0x210/0x480 __blk_mq_alloc_request+0x1c8/0x284 blk_get_request+0x74/0x134 ufshcd_exec_dev_cmd+0x68/0x640 ufshcd_verify_dev_init+0x68/0x35c ufshcd_probe_hba+0x12c/0x1cb8 ufshcd_host_reset_and_restore+0x88/0x254 ufshcd_reset_and_restore+0xd0/0x354 ufshcd_err_handler+0x408/0xc58 process_one_work+0x24c/0x66c worker_thread+0x3e8/0xa4c kthread+0x150/0x1b4 ret_from_fork+0x10/0x30 Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request.
In the Linux kernel, the following vulnerability has been resolved: s390/qeth: fix deadlock during failing recovery Commit 0b9902c1fcc5 ("s390/qeth: fix deadlock during recovery") removed taking discipline_mutex inside qeth_do_reset(), fixing potential deadlocks. An error path was missed though, that still takes discipline_mutex and thus has the original deadlock potential. Intermittent deadlocks were seen when a qeth channel path is configured offline, causing a race between qeth_do_reset and ccwgroup_remove. Call qeth_set_offline() directly in the qeth_do_reset() error case and then a new variant of ccwgroup_set_offline(), without taking discipline_mutex.
In the Linux kernel, the following vulnerability has been resolved: nitro_enclaves: Use get_user_pages_unlocked() call to handle mmap assert After commit 5b78ed24e8ec ("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()"), the call to get_user_pages() will trigger the mmap assert. static inline void mmap_assert_locked(struct mm_struct *mm) { lockdep_assert_held(&mm->mmap_lock); VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } [ 62.521410] kernel BUG at include/linux/mmap_lock.h:156! ........................................................... [ 62.538938] RIP: 0010:find_vma+0x32/0x80 ........................................................... [ 62.605889] Call Trace: [ 62.608502] <TASK> [ 62.610956] ? lock_timer_base+0x61/0x80 [ 62.614106] find_extend_vma+0x19/0x80 [ 62.617195] __get_user_pages+0x9b/0x6a0 [ 62.620356] __gup_longterm_locked+0x42d/0x450 [ 62.623721] ? finish_wait+0x41/0x80 [ 62.626748] ? __kmalloc+0x178/0x2f0 [ 62.629768] ne_set_user_memory_region_ioctl.isra.0+0x225/0x6a0 [nitro_enclaves] [ 62.635776] ne_enclave_ioctl+0x1cf/0x6d7 [nitro_enclaves] [ 62.639541] __x64_sys_ioctl+0x82/0xb0 [ 62.642620] do_syscall_64+0x3b/0x90 [ 62.645642] entry_SYSCALL_64_after_hwframe+0x44/0xae Use get_user_pages_unlocked() when setting the enclave memory regions. That's a similar pattern as mmap_read_lock() used together with get_user_pages().
In the Linux kernel, the following vulnerability has been resolved: tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc When running ltp testcase(ltp/testcases/kernel/pty/pty04.c) with arm64, there is a soft lockup, which look like this one: Workqueue: events_unbound flush_to_ldisc Call trace: dump_backtrace+0x0/0x1ec show_stack+0x24/0x30 dump_stack+0xd0/0x128 panic+0x15c/0x374 watchdog_timer_fn+0x2b8/0x304 __run_hrtimer+0x88/0x2c0 __hrtimer_run_queues+0xa4/0x120 hrtimer_interrupt+0xfc/0x270 arch_timer_handler_phys+0x40/0x50 handle_percpu_devid_irq+0x94/0x220 __handle_domain_irq+0x88/0xf0 gic_handle_irq+0x84/0xfc el1_irq+0xc8/0x180 slip_unesc+0x80/0x214 [slip] tty_ldisc_receive_buf+0x64/0x80 tty_port_default_receive_buf+0x50/0x90 flush_to_ldisc+0xbc/0x110 process_one_work+0x1d4/0x4b0 worker_thread+0x180/0x430 kthread+0x11c/0x120 In the testcase pty04, The first process call the write syscall to send data to the pty master. At the same time, the workqueue will do the flush_to_ldisc to pop data in a loop until there is no more data left. When the sender and workqueue running in different core, the sender sends data fastly in full time which will result in workqueue doing work in loop for a long time and occuring softlockup in flush_to_ldisc with kernel configured without preempt. So I add need_resched check and cond_resched in the flush_to_ldisc loop to avoid it.
In the Linux kernel, the following vulnerability has been resolved: ubifs: Fix deadlock in concurrent rename whiteout and inode writeback Following hung tasks: [ 77.028764] task:kworker/u8:4 state:D stack: 0 pid: 132 [ 77.028820] Call Trace: [ 77.029027] schedule+0x8c/0x1b0 [ 77.029067] mutex_lock+0x50/0x60 [ 77.029074] ubifs_write_inode+0x68/0x1f0 [ubifs] [ 77.029117] __writeback_single_inode+0x43c/0x570 [ 77.029128] writeback_sb_inodes+0x259/0x740 [ 77.029148] wb_writeback+0x107/0x4d0 [ 77.029163] wb_workfn+0x162/0x7b0 [ 92.390442] task:aa state:D stack: 0 pid: 1506 [ 92.390448] Call Trace: [ 92.390458] schedule+0x8c/0x1b0 [ 92.390461] wb_wait_for_completion+0x82/0xd0 [ 92.390469] __writeback_inodes_sb_nr+0xb2/0x110 [ 92.390472] writeback_inodes_sb_nr+0x14/0x20 [ 92.390476] ubifs_budget_space+0x705/0xdd0 [ubifs] [ 92.390503] do_rename.cold+0x7f/0x187 [ubifs] [ 92.390549] ubifs_rename+0x8b/0x180 [ubifs] [ 92.390571] vfs_rename+0xdb2/0x1170 [ 92.390580] do_renameat2+0x554/0x770 , are caused by concurrent rename whiteout and inode writeback processes: rename_whiteout(Thread 1) wb_workfn(Thread2) ubifs_rename do_rename lock_4_inodes (Hold ui_mutex) ubifs_budget_space make_free_space shrink_liability __writeback_inodes_sb_nr bdi_split_work_to_wbs (Queue new wb work) wb_do_writeback(wb work) __writeback_single_inode ubifs_write_inode LOCK(ui_mutex) ↑ wb_wait_for_completion (Wait wb work) <-- deadlock! Reproducer (Detail program in [Link]): 1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT) 2. Consume out of space before kernel(mdelay) doing budget for whiteout Fix it by doing whiteout space budget before locking ubifs inodes. BTW, it also fixes wrong goto tag 'out_release' in whiteout budget error handling path(It should at least recover dir i_size and unlock 4 ubifs inodes).
In the Linux kernel, the following vulnerability has been resolved: powerpc/set_memory: Avoid spinlock recursion in change_page_attr() Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") included a spin_lock() to change_page_attr() in order to safely perform the three step operations. But then commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") modify it to use pte_update() and do the operation safely against concurrent access. In the meantime, Maxime reported some spinlock recursion. [ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217 [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0 [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523 [ 15.373350] Workqueue: events do_free_init [ 15.377615] Call Trace: [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable) [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4 [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0 [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8 [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94 [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134 [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8 [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8 [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94 [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8 [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8 [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210 [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c Remove the read / modify / write sequence to make the operation atomic and remove the spin_lock() in change_page_attr(). To do the operation atomically, we can't use pte modification helpers anymore. Because all platforms have different combination of bits, it is not easy to use those bits directly. But all have the _PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare two sets to know which bits are set or cleared. For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you know which bit gets cleared and which bit get set when changing exec permission.
In the Linux kernel, the following vulnerability has been resolved: powerpc/bpf: Fix detecting BPF atomic instructions Commit 91c960b0056672 ("bpf: Rename BPF_XADD and prepare to encode other atomics in .imm") converted BPF_XADD to BPF_ATOMIC and added a way to distinguish instructions based on the immediate field. Existing JIT implementations were updated to check for the immediate field and to reject programs utilizing anything more than BPF_ADD (such as BPF_FETCH) in the immediate field. However, the check added to powerpc64 JIT did not look at the correct BPF instruction. Due to this, such programs would be accepted and incorrectly JIT'ed resulting in soft lockups, as seen with the atomic bounds test. Fix this by looking at the correct immediate value.
In the Linux kernel, the following vulnerability has been resolved: mac80211: fix deadlock in AP/VLAN handling Syzbot reports that when you have AP_VLAN interfaces that are up and close the AP interface they belong to, we get a deadlock. No surprise - since we dev_close() them with the wiphy mutex held, which goes back into the netdev notifier in cfg80211 and tries to acquire the wiphy mutex there. To fix this, we need to do two things: 1) prevent changing iftype while AP_VLANs are up, we can't easily fix this case since cfg80211 already calls us with the wiphy mutex held, but change_interface() is relatively rare in drivers anyway, so changing iftype isn't used much (and userspace has to fall back to down/change/up anyway) 2) pull the dev_close() loop over VLANs out of the wiphy mutex section in the normal stop case
In the Linux kernel, the following vulnerability has been resolved: net: systemport: Add global locking for descriptor lifecycle The descriptor list is a shared resource across all of the transmit queues, and the locking mechanism used today only protects concurrency across a given transmit queue between the transmit and reclaiming. This creates an opportunity for the SYSTEMPORT hardware to work on corrupted descriptors if we have multiple producers at once which is the case when using multiple transmit queues. This was particularly noticeable when using multiple flows/transmit queues and it showed up in interesting ways in that UDP packets would get a correct UDP header checksum being calculated over an incorrect packet length. Similarly TCP packets would get an equally correct checksum computed by the hardware over an incorrect packet length. The SYSTEMPORT hardware maintains an internal descriptor list that it re-arranges when the driver produces a new descriptor anytime it writes to the WRITE_PORT_{HI,LO} registers, there is however some delay in the hardware to re-organize its descriptors and it is possible that concurrent TX queues eventually break this internal allocation scheme to the point where the length/status part of the descriptor gets used for an incorrect data buffer. The fix is to impose a global serialization for all TX queues in the short section where we are writing to the WRITE_PORT_{HI,LO} registers which solves the corruption even with multiple concurrent TX queues being used.
In the Linux kernel, the following vulnerability has been resolved: mptcp: fix deadlock in __mptcp_push_pending() __mptcp_push_pending() may call mptcp_flush_join_list() with subflow socket lock held. If such call hits mptcp_sockopt_sync_all() then subsequently __mptcp_sockopt_sync() could try to lock the subflow socket for itself, causing a deadlock. sysrq: Show Blocked State task:ss-server state:D stack: 0 pid: 938 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x2d6/0x10c0 ? __mod_memcg_state+0x4d/0x70 ? csum_partial+0xd/0x20 ? _raw_spin_lock_irqsave+0x26/0x50 schedule+0x4e/0xc0 __lock_sock+0x69/0x90 ? do_wait_intr_irq+0xa0/0xa0 __lock_sock_fast+0x35/0x50 mptcp_sockopt_sync_all+0x38/0xc0 __mptcp_push_pending+0x105/0x200 mptcp_sendmsg+0x466/0x490 sock_sendmsg+0x57/0x60 __sys_sendto+0xf0/0x160 ? do_wait_intr_irq+0xa0/0xa0 ? fpregs_restore_userregs+0x12/0xd0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9ba546c2d0 RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0 RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234 RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060 R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8 </TASK> Fix the issue by using __mptcp_flush_join_list() instead of plain mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by Florian. The sockopt sync will be deferred to the workqueue.
A deadlock flaw was found in the Linux kernel’s BPF subsystem. This flaw allows a local user to potentially crash the system.
A vulnerability was found in btrfs_alloc_tree_b in fs/btrfs/extent-tree.c in the Linux kernel due to an improper lock operation in btrfs. In this flaw, a user with a local privilege may cause a denial of service (DOS) due to a deadlock problem.
In the Linux kernel, the following vulnerability has been resolved: ceph: fix deadlock or deadcode of misusing dget() The lock order is incorrect between denty and its parent, we should always make sure that the parent get the lock first. But since this deadcode is never used and the parent dir will always be set from the callers, let's just remove it.
A denial of service vulnerability was found in tipc_crypto_key_revoke in net/tipc/crypto.c in the Linux kernel’s TIPC subsystem. This flaw allows guests with local user privileges to trigger a deadlock and potentially crash the system.
A denial of service vulnerability due to a deadlock was found in sctp_auto_asconf_init in net/sctp/socket.c in the Linux kernel’s SCTP subsystem. This flaw allows guests with local user privileges to trigger a deadlock and potentially crash the system.
In the Linux kernel, the following vulnerability has been resolved: NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt The loop inside nfs_netfs_issue_read() currently does not disable interrupts while iterating through pages in the xarray to submit for NFS read. This is not safe though since after taking xa_lock, another page in the mapping could be processed for writeback inside an interrupt, and deadlock can occur. The fix is simple and clean if we use xa_for_each_range(), which handles the iteration with RCU while reducing code complexity. The problem is easily reproduced with the following test: mount -o vers=3,fsc 127.0.0.1:/export /mnt/nfs dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1 echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/nfs/file1.bin of=/dev/null umount /mnt/nfs On the console with a lockdep-enabled kernel a message similar to the following will be seen: ================================ WARNING: inconsistent lock state 6.7.0-lockdbg+ #10 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. test5/1708 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888127baa598 (&xa->xa_lock#4){+.?.}-{3:3}, at: nfs_netfs_issue_read+0x1b2/0x4b0 [nfs] {IN-SOFTIRQ-W} state was registered at: lock_acquire+0x144/0x380 _raw_spin_lock_irqsave+0x4e/0xa0 __folio_end_writeback+0x17e/0x5c0 folio_end_writeback+0x93/0x1b0 iomap_finish_ioend+0xeb/0x6a0 blk_update_request+0x204/0x7f0 blk_mq_end_request+0x30/0x1c0 blk_complete_reqs+0x7e/0xa0 __do_softirq+0x113/0x544 __irq_exit_rcu+0xfe/0x120 irq_exit_rcu+0xe/0x20 sysvec_call_function_single+0x6f/0x90 asm_sysvec_call_function_single+0x1a/0x20 pv_native_safe_halt+0xf/0x20 default_idle+0x9/0x20 default_idle_call+0x67/0xa0 do_idle+0x2b5/0x300 cpu_startup_entry+0x34/0x40 start_secondary+0x19d/0x1c0 secondary_startup_64_no_verify+0x18f/0x19b irq event stamp: 176891 hardirqs last enabled at (176891): [<ffffffffa67a0be4>] _raw_spin_unlock_irqrestore+0x44/0x60 hardirqs last disabled at (176890): [<ffffffffa67a0899>] _raw_spin_lock_irqsave+0x79/0xa0 softirqs last enabled at (176646): [<ffffffffa515d91e>] __irq_exit_rcu+0xfe/0x120 softirqs last disabled at (176633): [<ffffffffa515d91e>] __irq_exit_rcu+0xfe/0x120 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xa->xa_lock#4); <Interrupt> lock(&xa->xa_lock#4); *** DEADLOCK *** 2 locks held by test5/1708: #0: ffff888127baa498 (&sb->s_type->i_mutex_key#22){++++}-{4:4}, at: nfs_start_io_read+0x28/0x90 [nfs] #1: ffff888127baa650 (mapping.invalidate_lock#3){.+.+}-{4:4}, at: page_cache_ra_unbounded+0xa4/0x280 stack backtrace: CPU: 6 PID: 1708 Comm: test5 Kdump: loaded Not tainted 6.7.0-lockdbg+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 Call Trace: dump_stack_lvl+0x5b/0x90 mark_lock+0xb3f/0xd20 __lock_acquire+0x77b/0x3360 _raw_spin_lock+0x34/0x80 nfs_netfs_issue_read+0x1b2/0x4b0 [nfs] netfs_begin_read+0x77f/0x980 [netfs] nfs_netfs_readahead+0x45/0x60 [nfs] nfs_readahead+0x323/0x5a0 [nfs] read_pages+0xf3/0x5c0 page_cache_ra_unbounded+0x1c8/0x280 filemap_get_pages+0x38c/0xae0 filemap_read+0x206/0x5e0 nfs_file_read+0xb7/0x140 [nfs] vfs_read+0x2a9/0x460 ksys_read+0xb7/0x140
In the Linux kernel, the following vulnerability has been resolved: tty: fix deadlock caused by calling printk() under tty_port->lock pty_write() invokes kmalloc() which may invoke a normal printk() to print failure message. This can cause a deadlock in the scenario reported by syz-bot below: CPU0 CPU1 CPU2 ---- ---- ---- lock(console_owner); lock(&port_lock_key); lock(&port->lock); lock(&port_lock_key); lock(&port->lock); lock(console_owner); As commit dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") said, such deadlock can be prevented by using printk_deferred() in kmalloc() (which is invoked in the section guarded by the port->lock). But there are too many printk() on the kmalloc() path, and kmalloc() can be called from anywhere, so changing printk() to printk_deferred() is too complicated and inelegant. Therefore, this patch chooses to specify __GFP_NOWARN to kmalloc(), so that printk() will not be called, and this deadlock problem can be avoided. Syzbot reported the following lockdep error: ====================================================== WARNING: possible circular locking dependency detected 5.4.143-00237-g08ccc19a-dirty #10 Not tainted ------------------------------------------------------ syz-executor.4/29420 is trying to acquire lock: ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: console_trylock_spinning kernel/printk/printk.c:1752 [inline] ffffffff8aedb2a0 (console_owner){....}-{0:0}, at: vprintk_emit+0x2ca/0x470 kernel/printk/printk.c:2023 but task is already holding lock: ffff8880119c9158 (&port->lock){-.-.}-{2:2}, at: pty_write+0xf4/0x1f0 drivers/tty/pty.c:120 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&port->lock){-.-.}-{2:2}: __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159 tty_port_tty_get drivers/tty/tty_port.c:288 [inline] <-- lock(&port->lock); tty_port_default_wakeup+0x1d/0xb0 drivers/tty/tty_port.c:47 serial8250_tx_chars+0x530/0xa80 drivers/tty/serial/8250/8250_port.c:1767 serial8250_handle_irq.part.0+0x31f/0x3d0 drivers/tty/serial/8250/8250_port.c:1854 serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1827 [inline] <-- lock(&port_lock_key); serial8250_default_handle_irq+0xb2/0x220 drivers/tty/serial/8250/8250_port.c:1870 serial8250_interrupt+0xfd/0x200 drivers/tty/serial/8250/8250_core.c:126 __handle_irq_event_percpu+0x109/0xa50 kernel/irq/handle.c:156 [...] -> #1 (&port_lock_key){-.-.}-{2:2}: __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159 serial8250_console_write+0x184/0xa40 drivers/tty/serial/8250/8250_port.c:3198 <-- lock(&port_lock_key); call_console_drivers kernel/printk/printk.c:1819 [inline] console_unlock+0x8cb/0xd00 kernel/printk/printk.c:2504 vprintk_emit+0x1b5/0x470 kernel/printk/printk.c:2024 <-- lock(console_owner); vprintk_func+0x8d/0x250 kernel/printk/printk_safe.c:394 printk+0xba/0xed kernel/printk/printk.c:2084 register_console+0x8b3/0xc10 kernel/printk/printk.c:2829 univ8250_console_init+0x3a/0x46 drivers/tty/serial/8250/8250_core.c:681 console_init+0x49d/0x6d3 kernel/printk/printk.c:2915 start_kernel+0x5e9/0x879 init/main.c:713 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 -> #0 (console_owner){....}-{0:0}: [...] lock_acquire+0x127/0x340 kernel/locking/lockdep.c:4734 console_trylock_spinning kernel/printk/printk.c:1773 ---truncated---
In the Linux kernel, the following vulnerability has been resolved: drivers: tty: serial: Fix deadlock in sa1100_set_termios() There is a deadlock in sa1100_set_termios(), which is shown below: (Thread 1) | (Thread 2) | sa1100_enable_ms() sa1100_set_termios() | mod_timer() spin_lock_irqsave() //(1) | (wait a time) ... | sa1100_timeout() del_timer_sync() | spin_lock_irqsave() //(2) (wait timer to stop) | ... We hold sport->port.lock in position (1) of thread 1 and use del_timer_sync() to wait timer to stop, but timer handler also need sport->port.lock in position (2) of thread 2. As a result, sa1100_set_termios() will block forever. This patch moves del_timer_sync() before spin_lock_irqsave() in order to prevent the deadlock.
In the Linux kernel, the following vulnerability has been resolved: drm/amdkfd: Fix lock dependency warning with srcu ====================================================== WARNING: possible circular locking dependency detected 6.5.0-kfd-yangp #2289 Not tainted ------------------------------------------------------ kworker/0:2/996 is trying to acquire lock: (srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0 but task is already holding lock: ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at: process_one_work+0x211/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu] svm_range_set_attr+0xd6/0x14c0 [amdgpu] kfd_ioctl+0x1d1/0x630 [amdgpu] __x64_sys_ioctl+0x88/0xc0 -> #2 (&info->lock#2){+.+.}-{3:3}: __mutex_lock+0x99/0xc70 amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu] restore_process_helper+0x22/0x80 [amdgpu] restore_process_worker+0x2d/0xa0 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 -> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}: __flush_work+0x88/0x4f0 __cancel_work_timer+0x12c/0x1c0 kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu] __mmu_notifier_release+0xad/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 do_exit+0x322/0xb90 do_group_exit+0x37/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x38/0x80 -> #0 (srcu){.+.+}-{0:0}: __lock_acquire+0x1521/0x2510 lock_sync+0x5f/0x90 __synchronize_srcu+0x4f/0x1a0 __mmu_notifier_release+0x128/0x240 exit_mmap+0x6a/0x3a0 mmput+0x6a/0x120 svm_range_deferred_list_work+0x19f/0x350 [amdgpu] process_one_work+0x29b/0x560 worker_thread+0x3d/0x3d0 other info that might help us debug this: Chain exists of: srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&info->lock#2); lock((work_completion)(&svms->deferred_list_work)); sync(srcu);
In the Linux kernel, the following vulnerability has been resolved: ptdma: pt_core_execute_cmd() should use spinlock The interrupt handler (pt_core_irq_handler()) of the ptdma driver can be called from interrupt context. The code flow in this function can lead down to pt_core_execute_cmd() which will attempt to grab a mutex, which is not appropriate in interrupt context and ultimately leads to a kernel panic. The fix here changes this mutex to a spinlock, which has been verified to resolve the issue.
In the Linux kernel, the following vulnerability has been resolved: bus: mhi: host: Drop chan lock before queuing buffers Ensure read and write locks for the channel are not taken in succession by dropping the read lock from parse_xfer_event() such that a callback given to client can potentially queue buffers and acquire the write lock in that process. Any queueing of buffers should be done without channel read lock acquired as it can result in multiple locks and a soft lockup. [mani: added fixes tag and cc'ed stable]
In the Linux kernel, the following vulnerability has been resolved: mm/swapfile: add cond_resched() in get_swap_pages() The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently. The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup.
In the Linux kernel, the following vulnerability has been resolved: net: enetc: avoid deadlock in enetc_tx_onestep_tstamp() This lockdep splat says it better than I could: ================================ WARNING: inconsistent lock state 6.2.0-rc2-07010-ga9b9500ffaac-dirty #967 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kworker/1:3/179 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff3ec4036ce098 (_xmit_ETHER#2){+.?.}-{3:3}, at: netif_freeze_queues+0x5c/0xc0 {IN-SOFTIRQ-W} state was registered at: _raw_spin_lock+0x5c/0xc0 sch_direct_xmit+0x148/0x37c __dev_queue_xmit+0x528/0x111c ip6_finish_output2+0x5ec/0xb7c ip6_finish_output+0x240/0x3f0 ip6_output+0x78/0x360 ndisc_send_skb+0x33c/0x85c ndisc_send_rs+0x54/0x12c addrconf_rs_timer+0x154/0x260 call_timer_fn+0xb8/0x3a0 __run_timers.part.0+0x214/0x26c run_timer_softirq+0x3c/0x74 __do_softirq+0x14c/0x5d8 ____do_softirq+0x10/0x20 call_on_irq_stack+0x2c/0x5c do_softirq_own_stack+0x1c/0x30 __irq_exit_rcu+0x168/0x1a0 irq_exit_rcu+0x10/0x40 el1_interrupt+0x38/0x64 irq event stamp: 7825 hardirqs last enabled at (7825): [<ffffdf1f7200cae4>] exit_to_kernel_mode+0x34/0x130 hardirqs last disabled at (7823): [<ffffdf1f708105f0>] __do_softirq+0x550/0x5d8 softirqs last enabled at (7824): [<ffffdf1f7081050c>] __do_softirq+0x46c/0x5d8 softirqs last disabled at (7811): [<ffffdf1f708166e0>] ____do_softirq+0x10/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(_xmit_ETHER#2); <Interrupt> lock(_xmit_ETHER#2); *** DEADLOCK *** 3 locks held by kworker/1:3/179: #0: ffff3ec400004748 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0 #1: ffff80000a0bbdc8 ((work_completion)(&priv->tx_onestep_tstamp)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x6c0 #2: ffff3ec4036cd438 (&dev->tx_global_lock){+.+.}-{3:3}, at: netif_tx_lock+0x1c/0x34 Workqueue: events enetc_tx_onestep_tstamp Call trace: print_usage_bug.part.0+0x208/0x22c mark_lock+0x7f0/0x8b0 __lock_acquire+0x7c4/0x1ce0 lock_acquire.part.0+0xe0/0x220 lock_acquire+0x68/0x84 _raw_spin_lock+0x5c/0xc0 netif_freeze_queues+0x5c/0xc0 netif_tx_lock+0x24/0x34 enetc_tx_onestep_tstamp+0x20/0x100 process_one_work+0x28c/0x6c0 worker_thread+0x74/0x450 kthread+0x118/0x11c but I'll say it anyway: the enetc_tx_onestep_tstamp() work item runs in process context, therefore with softirqs enabled (i.o.w., it can be interrupted by a softirq). If we hold the netif_tx_lock() when there is an interrupt, and the NET_TX softirq then gets scheduled, this will take the netif_tx_lock() a second time and deadlock the kernel. To solve this, use netif_tx_lock_bh(), which blocks softirqs from running.
In the Linux kernel, the following vulnerability has been resolved: Input: cyapa - add missing input core locking to suspend/resume functions Grab input->mutex during suspend/resume functions like it is done in other input drivers. This fixes the following warning during system suspend/resume cycle on Samsung Exynos5250-based Snow Chromebook: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1680 at drivers/input/input.c:2291 input_device_enabled+0x68/0x6c Modules linked in: ... CPU: 1 PID: 1680 Comm: kworker/u4:12 Tainted: G W 6.6.0-rc5-next-20231009 #14109 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_unbound async_run_entry_fn unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0x1a8/0x1cc __warn from warn_slowpath_fmt+0x18c/0x1b4 warn_slowpath_fmt from input_device_enabled+0x68/0x6c input_device_enabled from cyapa_gen3_set_power_mode+0x13c/0x1dc cyapa_gen3_set_power_mode from cyapa_reinitialize+0x10c/0x15c cyapa_reinitialize from cyapa_resume+0x48/0x98 cyapa_resume from dpm_run_callback+0x90/0x298 dpm_run_callback from device_resume+0xb4/0x258 device_resume from async_resume+0x20/0x64 async_resume from async_run_entry_fn+0x40/0x15c async_run_entry_fn from process_scheduled_works+0xbc/0x6a8 process_scheduled_works from worker_thread+0x188/0x454 worker_thread from kthread+0x108/0x140 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf1625fb0 to 0xf1625ff8) ... ---[ end trace 0000000000000000 ]--- ... ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1680 at drivers/input/input.c:2291 input_device_enabled+0x68/0x6c Modules linked in: ... CPU: 1 PID: 1680 Comm: kworker/u4:12 Tainted: G W 6.6.0-rc5-next-20231009 #14109 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_unbound async_run_entry_fn unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0x1a8/0x1cc __warn from warn_slowpath_fmt+0x18c/0x1b4 warn_slowpath_fmt from input_device_enabled+0x68/0x6c input_device_enabled from cyapa_gen3_set_power_mode+0x13c/0x1dc cyapa_gen3_set_power_mode from cyapa_reinitialize+0x10c/0x15c cyapa_reinitialize from cyapa_resume+0x48/0x98 cyapa_resume from dpm_run_callback+0x90/0x298 dpm_run_callback from device_resume+0xb4/0x258 device_resume from async_resume+0x20/0x64 async_resume from async_run_entry_fn+0x40/0x15c async_run_entry_fn from process_scheduled_works+0xbc/0x6a8 process_scheduled_works from worker_thread+0x188/0x454 worker_thread from kthread+0x108/0x140 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf1625fb0 to 0xf1625ff8) ... ---[ end trace 0000000000000000 ]---
In the Linux kernel, the following vulnerability has been resolved: PM: sleep: Fix possible deadlocks in core system-wide PM code It is reported that in low-memory situations the system-wide resume core code deadlocks, because async_schedule_dev() executes its argument function synchronously if it cannot allocate memory (and not only in that case) and that function attempts to acquire a mutex that is already held. Executing the argument function synchronously from within dpm_async_fn() may also be problematic for ordering reasons (it may cause a consumer device's resume callback to be invoked before a requisite supplier device's one, for example). Address this by changing the code in question to use async_schedule_dev_nocall() for scheduling the asynchronous execution of device suspend and resume functions and to directly run them synchronously if async_schedule_dev_nocall() returns false.
In the Linux kernel, the following vulnerability has been resolved: wifi: rt2x00: restart beacon queue when hardware reset When a hardware reset is triggered, all registers are reset, so all queues are forced to stop in hardware interface. However, mac80211 will not automatically stop the queue. If we don't manually stop the beacon queue, the queue will be deadlocked and unable to start again. This patch fixes the issue where Apple devices cannot connect to the AP after calling ieee80211_restart_hw().
In the Linux kernel, the following vulnerability has been resolved: Bluetooth: Fix possible deadlock in rfcomm_sk_state_change syzbot reports a possible deadlock in rfcomm_sk_state_change [1]. While rfcomm_sock_connect acquires the sk lock and waits for the rfcomm lock, rfcomm_sock_release could have the rfcomm lock and hit a deadlock for acquiring the sk lock. Here's a simplified flow: rfcomm_sock_connect: lock_sock(sk) rfcomm_dlc_open: rfcomm_lock() rfcomm_sock_release: rfcomm_sock_shutdown: rfcomm_lock() __rfcomm_dlc_close: rfcomm_k_state_change: lock_sock(sk) This patch drops the sk lock before calling rfcomm_dlc_open to avoid the possible deadlock and holds sk's reference count to prevent use-after-free after rfcomm_dlc_open completes.
In the Linux kernel, the following vulnerability has been resolved: phy: lynx-28g: serialize concurrent phy_set_mode_ext() calls to shared registers The protocol converter configuration registers PCC8, PCCC, PCCD (implemented by the driver), as well as others, control protocol converters from multiple lanes (each represented as a different struct phy). So, if there are simultaneous calls to phy_set_mode_ext() to lanes sharing the same PCC register (either for the "old" or for the "new" protocol), corruption of the values programmed to hardware is possible, because lynx_28g_rmw() has no locking. Add a spinlock in the struct lynx_28g_priv shared by all lanes, and take the global spinlock from the phy_ops :: set_mode() implementation. There are no other callers which modify PCC registers.
In the Linux kernel, the following vulnerability has been resolved: can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report: - j1939_socks_lock - active_session_list_lock - sk_session_queue_lock A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock. NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase. [mkl: remove unrelated newline change]
In the Linux kernel, the following vulnerability has been resolved: drm: Don't unref the same fb many times by mistake due to deadlock handling If we get a deadlock after the fb lookup in drm_mode_page_flip_ioctl() we proceed to unref the fb and then retry the whole thing from the top. But we forget to reset the fb pointer back to NULL, and so if we then get another error during the retry, before the fb lookup, we proceed the unref the same fb again without having gotten another reference. The end result is that the fb will (eventually) end up being freed while it's still in use. Reset fb to NULL once we've unreffed it to avoid doing it again until we've done another fb lookup. This turned out to be pretty easy to hit on a DG2 when doing async flips (and CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y). The first symptom I saw that drm_closefb() simply got stuck in a busy loop while walking the framebuffer list. Fortunately I was able to convince it to oops instead, and from there it was easier to track down the culprit.
In the Linux kernel, the following vulnerability has been resolved: IB/ipoib: Fix mcast list locking Releasing the `priv->lock` while iterating the `priv->multicast_list` in `ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to remove the items while in the middle of iteration. If the mcast is removed while the lock was dropped, the for loop spins forever resulting in a hard lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel): Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below) -----------------------------------+----------------------------------- ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work) spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...) list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev) &priv->multicast_list, list) | ipoib_mcast_join(dev, mcast) | spin_unlock_irq(&priv->lock) | | spin_lock_irqsave(&priv->lock, flags) | list_for_each_entry_safe(mcast, tmcast, | &priv->multicast_list, list) | list_del(&mcast->list); | list_add_tail(&mcast->list, &remove_list) | spin_unlock_irqrestore(&priv->lock, flags) spin_lock_irq(&priv->lock) | | ipoib_mcast_remove_list(&remove_list) (Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast, `priv->multicast_list` and we keep | remove_list, list) spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done) the other thread which is blocked | and the list is still valid on | it's stack.) Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent eventual sleeps. Unfortunately we could not reproduce the lockup and confirm this fix but based on the code review I think this fix should address such lockups. crash> bc 31 PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2" -- [exception RIP: ipoib_mcast_join_task+0x1b1] RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 work (&priv->mcast_task{,.work}) RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 &mcast->list RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 mcast R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 dev priv (&priv->lock) &priv->multicast_list (aka head) ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib] #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f199a8c7e68 ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 (empty) crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000 mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>, mcast_mutex.owner.counter = 0xff1c69998efec000 crash> b 8 PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0" -- #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib] #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib] #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib] #7 [ff ---truncated---
In the Linux kernel, the following vulnerability has been resolved: sysv: don't call sb_bread() with pointers_lock held syzbot is reporting sleep in atomic context in SysV filesystem [1], for sb_bread() is called with rw_spinlock held. A "write_lock(&pointers_lock) => read_lock(&pointers_lock) deadlock" bug and a "sb_bread() with write_lock(&pointers_lock)" bug were introduced by "Replace BKL for chain locking with sysvfs-private rwlock" in Linux 2.5.12. Then, "[PATCH] err1-40: sysvfs locking fix" in Linux 2.6.8 fixed the former bug by moving pointers_lock lock to the callers, but instead introduced a "sb_bread() with read_lock(&pointers_lock)" bug (which made this problem easier to hit). Al Viro suggested that why not to do like get_branch()/get_block()/ find_shared() in Minix filesystem does. And doing like that is almost a revert of "[PATCH] err1-40: sysvfs locking fix" except that get_branch() from with find_shared() is called without write_lock(&pointers_lock).
In the Linux kernel, the following vulnerability has been resolved: dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock __dma_entry_alloc_check_leak() calls into printk -> serial console output (qcom geni) and grabs port->lock under free_entries_lock spin lock, which is a reverse locking dependency chain as qcom_geni IRQ handler can call into dma-debug code and grab free_entries_lock under port->lock. Move __dma_entry_alloc_check_leak() call out of free_entries_lock scope so that we don't acquire serial console's port->lock under it. Trimmed-down lockdep splat: The existing dependency chain (in reverse order) is: -> #2 (free_entries_lock){-.-.}-{2:2}: _raw_spin_lock_irqsave+0x60/0x80 dma_entry_alloc+0x38/0x110 debug_dma_map_page+0x60/0xf8 dma_map_page_attrs+0x1e0/0x230 dma_map_single_attrs.constprop.0+0x6c/0xc8 geni_se_rx_dma_prep+0x40/0xcc qcom_geni_serial_isr+0x310/0x510 __handle_irq_event_percpu+0x110/0x244 handle_irq_event_percpu+0x20/0x54 handle_irq_event+0x50/0x88 handle_fasteoi_irq+0xa4/0xcc handle_irq_desc+0x28/0x40 generic_handle_domain_irq+0x24/0x30 gic_handle_irq+0xc4/0x148 do_interrupt_handler+0xa4/0xb0 el1_interrupt+0x34/0x64 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 arch_local_irq_enable+0x4/0x8 ____do_softirq+0x18/0x24 ... -> #1 (&port_lock_key){-.-.}-{2:2}: _raw_spin_lock_irqsave+0x60/0x80 qcom_geni_serial_console_write+0x184/0x1dc console_flush_all+0x344/0x454 console_unlock+0x94/0xf0 vprintk_emit+0x238/0x24c vprintk_default+0x3c/0x48 vprintk+0xb4/0xbc _printk+0x68/0x90 register_console+0x230/0x38c uart_add_one_port+0x338/0x494 qcom_geni_serial_probe+0x390/0x424 platform_probe+0x70/0xc0 really_probe+0x148/0x280 __driver_probe_device+0xfc/0x114 driver_probe_device+0x44/0x100 __device_attach_driver+0x64/0xdc bus_for_each_drv+0xb0/0xd8 __device_attach+0xe4/0x140 device_initial_probe+0x1c/0x28 bus_probe_device+0x44/0xb0 device_add+0x538/0x668 of_device_add+0x44/0x50 of_platform_device_create_pdata+0x94/0xc8 of_platform_bus_create+0x270/0x304 of_platform_populate+0xac/0xc4 devm_of_platform_populate+0x60/0xac geni_se_probe+0x154/0x160 platform_probe+0x70/0xc0 ... -> #0 (console_owner){-...}-{0:0}: __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 console_flush_all+0x330/0x454 console_unlock+0x94/0xf0 vprintk_emit+0x238/0x24c vprintk_default+0x3c/0x48 vprintk+0xb4/0xbc _printk+0x68/0x90 dma_entry_alloc+0xb4/0x110 debug_dma_map_sg+0xdc/0x2f8 __dma_map_sg_attrs+0xac/0xe4 dma_map_sgtable+0x30/0x4c get_pages+0x1d4/0x1e4 [msm] msm_gem_pin_pages_locked+0x38/0xac [msm] msm_gem_pin_vma_locked+0x58/0x88 [msm] msm_ioctl_gem_submit+0xde4/0x13ac [msm] drm_ioctl_kernel+0xe0/0x15c drm_ioctl+0x2e8/0x3f4 vfs_ioctl+0x30/0x50 ... Chain exists of: console_owner --> &port_lock_key --> free_entries_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(free_entries_lock); lock(&port_lock_key); lock(free_entries_lock); lock(console_owner); *** DEADLOCK *** Call trace: dump_backtrace+0xb4/0xf0 show_stack+0x20/0x30 dump_stack_lvl+0x60/0x84 dump_stack+0x18/0x24 print_circular_bug+0x1cc/0x234 check_noncircular+0x78/0xac __lock_acquire+0xdf8/0x109c lock_acquire+0x234/0x284 console_flush_all+0x330/0x454 consol ---truncated---
In the Linux kernel, the following vulnerability has been resolved: io_uring: lock overflowing for IOPOLL syzbot reports an issue with overflow filling for IOPOLL: WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Workqueue: events_unbound io_ring_exit_work Call trace: io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773 io_fill_cqe_req io_uring/io_uring.h:168 [inline] io_do_iopoll+0x474/0x62c io_uring/rw.c:1065 io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513 io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056 io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869 process_one_work+0x2d8/0x504 kernel/workqueue.c:2289 worker_thread+0x340/0x610 kernel/workqueue.c:2436 kthread+0x12c/0x158 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863 There is no real problem for normal IOPOLL as flush is also called with uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL, for which __io_cqring_overflow_flush() happens from the CQ waiting path.
In the Linux kernel, the following vulnerability has been resolved: iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range When running an SVA case, the following soft lockup is triggered: -------------------------------------------------------------------- watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : ffff8000d83ef290 x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 -------------------------------------------------------------------- Note that since 6.6-rc1 the arm_smmu_mm_invalidate_range above is renamed to "arm_smmu_mm_arch_invalidate_secondary_tlbs", yet the problem remains. The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur to SMMU too, since arm_smmu_mm_arch_invalidate_secondary_tlbs() is called typically next to MMU tlb flush function, e.g. tlb_flush_mmu_tlbonly { tlb_flush { __flush_tlb_range { // check MAX_TLBI_OPS } } mmu_notifier_arch_invalidate_secondary_tlbs { arm_smmu_mm_arch_invalidate_secondary_tlbs { // does not check MAX_TLBI_OPS } } } Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an SVA case SMMU uses the CPU page table, so it makes sense to align with the tlbflush code. Then, replace per-page TLBI commands with a single per-asid TLBI command, if the request size hits this threshold.
In the Linux kernel, the following vulnerability has been resolved: scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg() In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard lockup call trace hangs the system. Call Trace: _raw_spin_lock_irqsave+0x32/0x40 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc] lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc] lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc] lpfc_els_flush_cmd+0x43c/0x670 [lpfc] lpfc_els_flush_all_cmd+0x37/0x60 [lpfc] lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc] lpfc_do_work+0x1485/0x1d70 [lpfc] kthread+0x112/0x130 ret_from_fork+0x1f/0x40 Kernel panic - not syncing: Hard LOCKUP The same CPU tries to claim the phba->port_list_lock twice. Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and lpfc_printf_log() macros before calling lpfc_dmp_dbg(). There is no need to take the phba->port_list_lock within lpfc_dmp_dbg().
A race condition was found in the Linux kernel's watch queue due to a missing lock in pipe_resize_ring(). The specific flaw exists within the handling of pipe buffers. The issue results from the lack of proper locking when performing operations on an object. This flaw allows a local user to crash the system or escalate their privileges on the system.
In the Linux kernel, the following vulnerability has been resolved: cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that cpuset_attach() is also called from cgroup_attach_task_all(). Add cpus_read_lock() like what cgroup_procs_write_start() does.
An issue was discovered in the Linux kernel through 5.6.11. btree_gc_coalesce in drivers/md/bcache/btree.c has a deadlock if a coalescing operation fails.
In the Linux kernel, the following vulnerability has been resolved: net: nfc: llcp: Add lock when modifying device list The device list needs its associated lock held when modifying it, or the list could become corrupted, as syzbot discovered.
In the Linux kernel, the following vulnerability has been resolved: ocfs2: Avoid touching renamed directory if parent does not change The VFS will not be locking moved directory if its parent does not change. Change ocfs2 rename code to avoid touching renamed directory if its parent does not change as without locking that can corrupt the filesystem.
In the Linux kernel, the following vulnerability has been resolved: serial: imx: fix tx statemachine deadlock When using the serial port as RS485 port, the tx statemachine is used to control the RTS pin to drive the RS485 transceiver TX_EN pin. When the TTY port is closed in the middle of a transmission (for instance during userland application crash), imx_uart_shutdown disables the interface and disables the Transmission Complete interrupt. afer that, imx_uart_stop_tx bails on an incomplete transmission, to be retriggered by the TC interrupt. This interrupt is disabled and therefore the tx statemachine never transitions out of SEND. The statemachine is in deadlock now, and the TX_EN remains low, making the interface useless. imx_uart_stop_tx now checks for incomplete transmission AND whether TC interrupts are enabled before bailing to be retriggered. This makes sure the state machine handling is reached, and is properly set to WAIT_AFTER_SEND.