In the Linux kernel, the following vulnerability has been resolved: media: ti: j721e-csi2rx: Fix races while restarting DMA After the frame is submitted to DMA, it may happen that the submitted list is not updated soon enough, and the DMA callback is triggered before that. This can lead to kernel crashes, so move everything in a single lock/unlock section to prevent such races.
In the Linux kernel, the following vulnerability has been resolved: accel/ivpu: Fix race condition when unbinding BOs Fix 'Memory manager not clean during takedown' warning that occurs when ivpu_gem_bo_free() removes the BO from the BOs list before it gets unmapped. Then file_priv_unbind() triggers a warning in drm_mm_takedown() during context teardown. Protect the unmapping sequence with bo_list_lock to ensure the BO is always fully unmapped when removed from the list. This ensures the BO is either fully unmapped at context teardown time or present on the list and unmapped by file_priv_unbind().
In the Linux kernel, the following vulnerability has been resolved: timers: Fix NULL function pointer race in timer_shutdown_sync() There is a race condition between timer_shutdown_sync() and timer expiration that can lead to hitting a WARN_ON in expire_timers(). The issue occurs when timer_shutdown_sync() clears the timer function to NULL while the timer is still running on another CPU. The race scenario looks like this: CPU0 CPU1 <SOFTIRQ> lock_timer_base() expire_timers() base->running_timer = timer; unlock_timer_base() [call_timer_fn enter] mod_timer() ... timer_shutdown_sync() lock_timer_base() // For now, will not detach the timer but only clear its function to NULL if (base->running_timer != timer) ret = detach_if_pending(timer, base, true); if (shutdown) timer->function = NULL; unlock_timer_base() [call_timer_fn exit] lock_timer_base() base->running_timer = NULL; unlock_timer_base() ... // Now timer is pending while its function set to NULL. // next timer trigger <SOFTIRQ> expire_timers() WARN_ON_ONCE(!fn) // hit ... lock_timer_base() // Now timer will detach if (base->running_timer != timer) ret = detach_if_pending(timer, base, true); if (shutdown) timer->function = NULL; unlock_timer_base() The problem is that timer_shutdown_sync() clears the timer function regardless of whether the timer is currently running. This can leave a pending timer with a NULL function pointer, which triggers the WARN_ON_ONCE(!fn) check in expire_timers(). Fix this by only clearing the timer function when actually detaching the timer. If the timer is running, leave the function pointer intact, which is safe because the timer will be properly detached when it finishes running.
In the Linux kernel, the following vulnerability has been resolved: netrom: Fix data-races around sysctl_net_busy_read We need to protect the reader reading the sysctl value because the value can be changed concurrently.
In the Linux kernel, the following vulnerability has been resolved: wireguard: receive: annotate data-race around receiving_counter.counter Syzkaller with KCSAN identified a data-race issue when accessing keypair->receiving_counter.counter. Use READ_ONCE() and WRITE_ONCE() annotations to mark the data race as intentional. BUG: KCSAN: data-race in wg_packet_decrypt_worker / wg_packet_rx_poll write to 0xffff888107765888 of 8 bytes by interrupt on cpu 0: counter_validate drivers/net/wireguard/receive.c:321 [inline] wg_packet_rx_poll+0x3ac/0xf00 drivers/net/wireguard/receive.c:461 __napi_poll+0x60/0x3b0 net/core/dev.c:6536 napi_poll net/core/dev.c:6605 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6738 __do_softirq+0xc4/0x279 kernel/softirq.c:553 do_softirq+0x5e/0x90 kernel/softirq.c:454 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x6c5/0x700 drivers/net/wireguard/receive.c:499 process_one_work kernel/workqueue.c:2633 [inline] ... read to 0xffff888107765888 of 8 bytes by task 3196 on cpu 1: decrypt_packet drivers/net/wireguard/receive.c:252 [inline] wg_packet_decrypt_worker+0x220/0x700 drivers/net/wireguard/receive.c:501 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 ...
In the Linux kernel, the following vulnerability has been resolved: hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the VMBus channel"), napi_disable was getting called for all channels, including all subchannels without confirming if they are enabled or not. This caused hv_netvsc getting hung at napi_disable, when netvsc_probe() has finished running but nvdev->subchan_work has not started yet. netvsc_subchan_work() -> rndis_set_subchannel() has not created the sub-channels and because of that netvsc_sc_open() is not running. netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which netvsc_subchan_work did not run. netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the opposite. Now during netvsc_device_remove(), when napi_disable is called for those subchannels, napi_disable gets stuck on infinite msleep. This fix addresses this problem by ensuring that napi_disable() is not getting called for non-enabled NAPI struct. But netif_napi_del() is still necessary for these non-enabled NAPI struct for cleanup purpose. Call trace: [ 654.559417] task:modprobe state:D stack: 0 pid: 2321 ppid: 1091 flags:0x00004002 [ 654.568030] Call Trace: [ 654.571221] <TASK> [ 654.573790] __schedule+0x2d6/0x960 [ 654.577733] schedule+0x69/0xf0 [ 654.581214] schedule_timeout+0x87/0x140 [ 654.585463] ? __bpf_trace_tick_stop+0x20/0x20 [ 654.590291] msleep+0x2d/0x40 [ 654.593625] napi_disable+0x2b/0x80 [ 654.597437] netvsc_device_remove+0x8a/0x1f0 [hv_netvsc] [ 654.603935] rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc] [ 654.611101] ? do_wait_intr+0xb0/0xb0 [ 654.615753] netvsc_remove+0x7c/0x120 [hv_netvsc] [ 654.621675] vmbus_remove+0x27/0x40 [hv_vmbus]
In the Linux kernel, the following vulnerability has been resolved: f2fs: fix to truncate meta inode pages forcely Below race case can cause data corruption: Thread A GC thread - gc_data_segment - ra_data_block - locked meta_inode page - f2fs_inplace_write_data - invalidate_mapping_pages : fail to invalidate meta_inode page due to lock failure or dirty|writeback status - f2fs_submit_page_bio : write last dirty data to old blkaddr - move_data_block - load old data from meta_inode page - f2fs_submit_page_write : write old data to new blkaddr Because invalidate_mapping_pages() will skip invalidating page which has unclear status including locked, dirty, writeback and so on, so we need to use truncate_inode_pages_range() instead of invalidate_mapping_pages() to make sure meta_inode page will be dropped.
In the Linux kernel, the following vulnerability has been resolved: blk-mq: fix IO hang from sbitmap wakeup race In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered with the following blk_mq_get_driver_tag() in case of getting driver tag failure. Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime blk_mq_mark_tag_wait() can't get driver tag successfully. This issue can be reproduced by running the following test in loop, and fio hang can be observed in < 30min when running it on my test VM in laptop. modprobe -r scsi_debug modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4 dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \ --runtime=100 --numjobs=40 --time_based --name=test \ --ioengine=libaio Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which is just fine in case of running out of tag.
In the Linux kernel, the following vulnerability has been resolved: tls: fix race between tx work scheduling and socket close Similarly to previous commit, the submitting thread (recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete(). Reorder scheduling the work before calling complete(). This seems more logical in the first place, as it's the inverse order of what the submitting thread will do.
In the Linux kernel, the following vulnerability has been resolved: s390/cio: fix race condition during online processing A race condition exists in ccw_device_set_online() that can cause the online process to fail, leaving the affected device in an inconsistent state. As a result, subsequent attempts to set that device online fail with return code ENODEV. The problem occurs when a path verification request arrives after a wait for final device state completed, but before the result state is evaluated. Fix this by ensuring that the CCW-device lock is held between determining final state and checking result state. Note that since: commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") path verification requests are much more likely to occur during boot, resulting in an increased chance of this race condition occurring.
In the Linux kernel, the following vulnerability has been resolved: af_unix: Fix garbage collector racing against connect() Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that carries SCM_RIGHTS, two consecutive passes of scan_children() may see a different set of children. Leading to an incorrectly elevated inflight count, and then a dangling pointer within the gc_inflight_list. sockets are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight socket bound to addr, not in fdtable V's fd will be passed via sendmsg(), gets inflight count bumped connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc() ---------------- ------------------------- ----------- NS = unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0 NS = unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V]) // V became in-flight // V count=2 inflight=1 close(V) // V count=1 inflight=1 // GC candidate condition met for u in gc_inflight_list: if (total_refs == inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not // reachable from L yet, so V's // inflight remains unchanged __skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if (u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1 inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock its state. This makes GC wait until the end of any ongoing connect() to that socket. After flipping the lock, a possibly SCM-laden embryo is already enqueued. And if there is another embryo coming, it can not possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen because unix_gc_lock is already taken. Inflight graph remains unaffected.
In the Linux kernel, the following vulnerability has been resolved: mptcp: fix data races on remote_id Similar to the previous patch, address the data race on remote_id, adding the suitable ONCE annotations.
In the Linux kernel, the following vulnerability has been resolved: dmaengine: dw-edma: eDMA: Add sync read before starting the DMA transfer in remote setup The Linked list element and pointer are not stored in the same memory as the eDMA controller register. If the doorbell register is toggled before the full write of the linked list a race condition error will occur. In remote setup we can only use a readl to the memory to assure the full write has occurred.
In the Linux kernel, the following vulnerability has been resolved: netfilter: ipset: fix performance regression in swap operation The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead. Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback.
In the Linux kernel, the following vulnerability has been resolved: quota: Fix potential NULL pointer dereference Below race may cause NULL pointer dereference P1 P2 dquot_free_inode quota_off drop_dquot_ref remove_dquot_ref dquots = i_dquot(inode) dquots = i_dquot(inode) srcu_read_lock dquots[cnt]) != NULL (1) dquots[type] = NULL (2) spin_lock(&dquots[cnt]->dq_dqb_lock) (3) .... If dquot_free_inode(or other routines) checks inode's quota pointers (1) before quota_off sets it to NULL(2) and use it (3) after that, NULL pointer dereference will be triggered. So let's fix it by using a temporary pointer to avoid this issue.
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix race when detecting delalloc ranges during fiemap For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc). This however introduced a race that makes us miss delalloc ranges for file regions that are currently holes, so the caller of fiemap will not be aware that there's data for some file regions. This can be quite serious for some use cases - for example in coreutils versions before 9.0, the cp program used fiemap to detect holes and data in the source file, copying only regions with data (extents or delalloc) from the source file to the destination file in order to preserve holes (see the documentation for its --sparse command line option). This means that if cp was used with a source file that had delalloc in a hole, the destination file could end up without that data, which is effectively a data loss issue, if it happened to hit the race described below. The race happens like this: 1) Fiemap is called, without the FIEMAP_FLAG_SYNC flag, for a file that has delalloc in the file range [64M, 65M[, which is currently a hole; 2) Fiemap locks the inode in shared mode, then starts iterating the inode's subvolume tree searching for file extent items, without having the whole fiemap target range locked in the inode's io tree - the change introduced recently by commit b0ad381fa769 ("btrfs: fix deadlock with fiemap and extent locking"). It only locks ranges in the io tree when it finds a hole or prealloc extent since that commit; 3) Note that fiemap clones each leaf before using it, and this is to avoid deadlocks when locking a file range in the inode's io tree and the fiemap buffer is memory mapped to some file, because writing to the page with btrfs_page_mkwrite() will wait on any ordered extent for the page's range and the ordered extent needs to lock the range and may need to modify the same leaf, therefore leading to a deadlock on the leaf; 4) While iterating the file extent items in the cloned leaf before finding the hole in the range [64M, 65M[, the delalloc in that range is flushed and its ordered extent completes - meaning the corresponding file extent item is in the inode's subvolume tree, but not present in the cloned leaf that fiemap is iterating over; 5) When fiemap finds the hole in the [64M, 65M[ range by seeing the gap in the cloned leaf (or a file extent item with disk_bytenr == 0 in case the NO_HOLES feature is not enabled), it will lock that file range in the inode's io tree and then search for delalloc by checking for the EXTENT_DELALLOC bit in the io tree for that range and ordered extents (with btrfs_find_delalloc_in_range()). But it finds nothing since the delalloc in that range was already flushed and the ordered extent completed and is gone - as a result fiemap will not report that there's delalloc or an extent for the range [64M, 65M[, so user space will be mislead into thinking that there's a hole in that range. This could actually be sporadically triggered with test case generic/094 from fstests, which reports a missing extent/delalloc range like this: generic/094 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad) --- tests/generic/094.out 2020-06-10 19:29:03.830519425 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad 2024-02-28 11:00:00.381071525 +0000 @@ -1,3 +1,9 @@ QA output created by 094 fiemap run with sync fiemap run without sync +ERROR: couldn't find extent at 7 +map is 'HHDDHPPDPHPH' +logical: [ 5.. 6] phys: ---truncated---
In the Linux kernel, the following vulnerability has been resolved: KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the loads and stores are atomic. In the extremely unlikely scenario the compiler tears the stores, it's theoretically possible for KVM to attempt to get a vCPU using an out-of-bounds index, e.g. if the write is split into multiple 8-bit stores, and is paired with a 32-bit load on a VM with 257 vCPUs: CPU0 CPU1 last_boosted_vcpu = 0xff; (last_boosted_vcpu = 0x100) last_boosted_vcpu[15:8] = 0x01; i = (last_boosted_vcpu = 0x1ff) last_boosted_vcpu[7:0] = 0x00; vcpu = kvm->vcpu_array[0x1ff]; As detected by KCSAN: BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm] write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16: kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:? arch/x86/kvm/vmx/vmx.c:6606) kvm_intel vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890) __x64_sys_ioctl (fs/ioctl.c:890) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4: kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:? arch/x86/kvm/vmx/vmx.c:6606) kvm_intel vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890) __x64_sys_ioctl (fs/ioctl.c:890) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) value changed: 0x00000012 -> 0x00000000
In the Linux kernel, the following vulnerability has been resolved: bonding: fix oops during rmmod "rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function"). Here are the relevant functions being called: bonding_exit() bond_destroy_debugfs() debugfs_remove_recursive(bonding_debug_root); bonding_debug_root = NULL; <--------- SET TO NULL HERE bond_netlink_fini() rtnl_link_unregister() __rtnl_link_unregister() unregister_netdevice_many_notify() bond_uninit() bond_debug_unregister() (commit removed check for bonding_debug_root == NULL) debugfs_remove() simple_recursive_removal() down_write() -> OOPS However, reverting the bad commit does not solve the problem completely because the original code contains a race that could cause the same oops, although it was much less likely to be triggered unintentionally: CPU1 rmmod bonding bonding_exit() bond_destroy_debugfs() debugfs_remove_recursive(bonding_debug_root); CPU2 echo -bond0 > /sys/class/net/bonding_masters bond_uninit() bond_debug_unregister() if (!bonding_debug_root) CPU1 bonding_debug_root = NULL; So do NOT revert the bad commit (since the removed checks were racy anyway), and instead change the order of actions taken during module removal. The same oops can also happen if there is an error during module init, so apply the same fix there.
In the Linux kernel, the following vulnerability has been resolved: spi: tegra210-quad: Protect curr_xfer check in IRQ handler Now that all other accesses to curr_xfer are done under the lock, protect the curr_xfer NULL check in tegra_qspi_isr_thread() with the spinlock. Without this protection, the following race can occur: CPU0 (ISR thread) CPU1 (timeout path) ---------------- ------------------- if (!tqspi->curr_xfer) // sees non-NULL spin_lock() tqspi->curr_xfer = NULL spin_unlock() handle_*_xfer() spin_lock() t = tqspi->curr_xfer // NULL! ... t->len ... // NULL dereference! With this patch, all curr_xfer accesses are now properly synchronized. Although all accesses to curr_xfer are done under the lock, in tegra_qspi_isr_thread() it checks for NULL, releases the lock and reacquires it later in handle_cpu_based_xfer()/handle_dma_based_xfer(). There is a potential for an update in between, which could cause a NULL pointer dereference. To handle this, add a NULL check inside the handlers after acquiring the lock. This ensures that if the timeout path has already cleared curr_xfer, the handler will safely return without dereferencing the NULL pointer.
A race condition was found in the Linux kernel's scsi device driver in lpfc_unregister_fcf_rescan() function. This can result in a null pointer dereference issue, possibly leading to a kernel panic or denial of service issue.
A race condition was found in the Linux kernel's media/dvb-core in dvbdmx_write()Â function. This can result in a null pointer dereference issue, possibly leading to a kernel panic or denial of service issue.
A race condition was found in the Linux kernel's sound/hda device driver in snd_hdac_regmap_sync() function. This can result in a null pointer dereference issue, possibly leading to a kernel panic or denial of service issue.
A race condition was found in the Linux kernel's drm/exynos device driver in exynos_drm_crtc_atomic_disable() function. This can result in a null pointer dereference issue, possibly leading to a kernel panic or denial of service issue.
In the Linux kernel, the following vulnerability has been resolved: mm/ksm: fix race with VMA iteration and mm_struct teardown exit_mmap() will tear down the VMAs and maple tree with the mmap_lock held in write mode. Ensure that the maple tree is still valid by checking ksm_test_exit() after taking the mmap_lock in read mode, but before the for_each_vma() iterator dereferences a destroyed maple tree. Since the maple tree is destroyed, the flags telling lockdep to check an external lock has been cleared. Skip the for_each_vma() iterator to avoid dereferencing a maple tree without the external lock flag, which would create a lockdep warning.
In the Linux kernel, the following vulnerability has been resolved: ksmbd: Fix race condition in RPC handle list access The 'sess->rpc_handle_list' XArray manages RPC handles within a ksmbd session. Access to this list is intended to be protected by 'sess->rpc_lock' (an rw_semaphore). However, the locking implementation was flawed, leading to potential race conditions. In ksmbd_session_rpc_open(), the code incorrectly acquired only a read lock before calling xa_store() and xa_erase(). Since these operations modify the XArray structure, a write lock is required to ensure exclusive access and prevent data corruption from concurrent modifications. Furthermore, ksmbd_session_rpc_method() accessed the list using xa_load() without holding any lock at all. This could lead to reading inconsistent data or a potential use-after-free if an entry is concurrently removed and the pointer is dereferenced. Fix these issues by: 1. Using down_write() and up_write() in ksmbd_session_rpc_open() to ensure exclusive access during XArray modification, and ensuring the lock is correctly released on error paths. 2. Adding down_read() and up_read() in ksmbd_session_rpc_method() to safely protect the lookup.
An issue was discovered in the Linux kernel through 6.0.9. drivers/media/usb/ttusb-dec/ttusb_dec.c has a memory leak because of the lack of a dvb_frontend_detach call.
roccat_report_event in drivers/hid/hid-roccat.c in the Linux kernel through 5.19.12 has a race condition and resultant use-after-free in certain situations where a report is received while copying a report->value is in progress.
In the Linux kernel, the following vulnerability has been resolved: scsi: target: iscsi: Fix a race condition between login_work and the login thread In case a malicious initiator sends some random data immediately after a login PDU; the iscsi_target_sk_data_ready() callback will schedule the login_work and, at the same time, the negotiation may end without clearing the LOGIN_FLAGS_INITIAL_PDU flag (because no additional PDU exchanges are required to complete the login). The login has been completed but the login_work function will find the LOGIN_FLAGS_INITIAL_PDU flag set and will never stop from rescheduling itself; at this point, if the initiator drops the connection, the iscsit_conn structure will be freed, login_work will dereference a released socket structure and the kernel crashes. BUG: kernel NULL pointer dereference, address: 0000000000000230 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page Workqueue: events iscsi_target_do_login_rx [iscsi_target_mod] RIP: 0010:_raw_read_lock_bh+0x15/0x30 Call trace: iscsi_target_do_login_rx+0x75/0x3f0 [iscsi_target_mod] process_one_work+0x1e8/0x3c0 Fix this bug by forcing login_work to stop after the login has been completed and the socket callbacks have been restored. Add a comment to clearify the return values of iscsi_target_do_login()
An issue was discovered in include/asm-generic/tlb.h in the Linux kernel before 5.19. Because of a race condition (unmap_mapping_range versus munmap), a device driver can free a page while it still has stale TLB entries. This only occurs in situations with VM_PFNMAP VMAs.
An issue was discovered in the Linux kernel through 5.19.8. drivers/firmware/efi/capsule-loader.c has a race condition with a resultant use-after-free.
In the Linux kernel, the following vulnerability has been resolved: fscache: Fix oops due to race with cookie_lru and use_cookie If a cookie expires from the LRU and the LRU_DISCARD flag is set, but the state machine has not run yet, it's possible another thread can call fscache_use_cookie and begin to use it. When the cookie_worker finally runs, it will see the LRU_DISCARD flag set, transition the cookie->state to LRU_DISCARDING, which will then withdraw the cookie. Once the cookie is withdrawn the object is removed the below oops will occur because the object associated with the cookie is now NULL. Fix the oops by clearing the LRU_DISCARD bit if another thread uses the cookie before the cookie_worker runs. BUG: kernel NULL pointer dereference, address: 0000000000000008 ... CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] ... Call Trace: netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] process_one_work+0x217/0x3e0 worker_thread+0x4a/0x3b0 kthread+0xd6/0x100
In the Linux kernel, the following vulnerability has been resolved: tcp: Fix data-races around sysctl_tcp_min_snd_mss. While reading sysctl_tcp_min_snd_mss, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
A race condition flaw was found in the Linux kernel sound subsystem due to improper locking. It could lead to a NULL pointer dereference while handling the SNDCTL_DSP_SYNC ioctl. A privileged local user (root or member of the audio group) could use this flaw to crash the system, resulting in a denial of service condition
In the Linux kernel, the following vulnerability has been resolved: tls: fix race between async notify and socket close The submitting thread (one which called recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete() so any code past that point risks touching already freed data. Try to avoid the locking and extra flags altogether. Have the main thread hold an extra reference, this way we can depend solely on the atomic ref counter for synchronization. Don't futz with reiniting the completion, either, we are now tightly controlling when completion fires.
In the Linux kernel, the following vulnerability has been resolved: fs/fhandle.c: fix a race in call of has_locked_children() may_decode_fh() is calling has_locked_children() while holding no locks. That's an oopsable race... The rest of the callers are safe since they are holding namespace_sem and are guaranteed a positive refcount on the mount in question. Rename the current has_locked_children() to __has_locked_children(), make it static and switch the fs/namespace.c users to it. Make has_locked_children() a wrapper for __has_locked_children(), calling the latter under read_seqlock_excl(&mount_lock).
In the Linux kernel, the following vulnerability has been resolved: ksmbd: fix Preauh_HashValue race condition If client send multiple session setup requests to ksmbd, Preauh_HashValue race condition could happen. There is no need to free sess->Preauh_HashValue at session setup phase. It can be freed together with session at connection termination phase.
In the Linux kernel, the following vulnerability has been resolved: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race huge_pmd_unshare() drops a reference on a page table that may have previously been shared across processes, potentially turning it into a normal page table used in another process in which unrelated VMAs can afterwards be installed. If this happens in the middle of a concurrent gup_fast(), gup_fast() could end up walking the page tables of another process. While I don't see any way in which that immediately leads to kernel memory corruption, it is really weird and unexpected. Fix it with an explicit broadcast IPI through tlb_remove_table_sync_one(), just like we do in khugepaged when removing page tables for a THP collapse.
In the Linux kernel, the following vulnerability has been resolved: s390/qeth: fix deadlock during failing recovery Commit 0b9902c1fcc5 ("s390/qeth: fix deadlock during recovery") removed taking discipline_mutex inside qeth_do_reset(), fixing potential deadlocks. An error path was missed though, that still takes discipline_mutex and thus has the original deadlock potential. Intermittent deadlocks were seen when a qeth channel path is configured offline, causing a race between qeth_do_reset and ccwgroup_remove. Call qeth_set_offline() directly in the qeth_do_reset() error case and then a new variant of ccwgroup_set_offline(), without taking discipline_mutex.
In the Linux kernel, the following vulnerability has been resolved: net: bridge: switchdev: Skip MDB replays of deferred events on offload Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration. While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event. The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed. This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario. To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge. For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it: root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0 And then destroy the bridge: root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$ The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed. Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled: root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1 All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0). Eliminate the race in two steps: 1. Grab the write-side lock of the MDB while generating the replay list. This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore: 2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions.
In the Linux kernel, the following vulnerability has been resolved: nfsd: Fix nsfd startup race (again) Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") has re-opened rpc_pipefs_event() race against nfsd_net_id registration (register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76 ("nfsd: fix nsfd startup race triggering BUG_ON"). Restore the order of register_pernet_subsys() vs register_cld_notifier(). Add WARN_ON() to prevent a future regression. Crash info: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012 CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1 pc : rpc_pipefs_event+0x54/0x120 [nfsd] lr : rpc_pipefs_event+0x48/0x120 [nfsd] Call trace: rpc_pipefs_event+0x54/0x120 [nfsd] blocking_notifier_call_chain rpc_fill_super get_tree_keyed rpc_fs_get_tree vfs_get_tree do_mount ksys_mount __arm64_sys_mount el0_svc_handler el0_svc
In the Linux kernel, the following vulnerability has been resolved: btrfs: use latest_dev in btrfs_show_devname The test case btrfs/238 reports the warning below: WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 btrfs_show_devname+0x104/0x1e8 [btrfs] CPU: 2 PID: 1 Comm: systemd Tainted: G W O 5.14.0-rc1-custom #72 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_show_devname+0x108/0x1b4 [btrfs] show_mountinfo+0x234/0x2c4 m_show+0x28/0x34 seq_read_iter+0x12c/0x3c4 vfs_read+0x29c/0x2c8 ksys_read+0x80/0xec __arm64_sys_read+0x28/0x34 invoke_syscall+0x50/0xf8 do_el0_svc+0x88/0x138 el0_svc+0x2c/0x8c el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x198/0x19c Reason: While btrfs_prepare_sprout() moves the fs_devices::devices into fs_devices::seed_list, the btrfs_show_devname() searches for the devices and found none, leading to the warning as in above. Fix: latest_dev is updated according to the changes to the device list. That means we could use the latest_dev->name to show the device name in /proc/self/mounts, the pointer will be always valid as it's assigned before the device is deleted from the list in remove or replace. The RCU protection is sufficient as the device structure is freed after synchronization.
In the Linux kernel, the following vulnerability has been resolved: io-wq: check for wq exit after adding new worker task_work We check IO_WQ_BIT_EXIT before attempting to create a new worker, and wq exit cancels pending work if we have any. But it's possible to have a race between the two, where creation checks exit finding it not set, but we're in the process of exiting. The exit side will cancel pending creation task_work, but there's a gap where we add task_work after we've canceled existing creations at exit time. Fix this by checking the EXIT bit post adding the creation task_work. If it's set, run the same cancelation that exit does.
In the Linux kernel, the following vulnerability has been resolved: userfaultfd: fix a race between writeprotect and exit_mmap() A race is possible when a process exits, its VMAs are removed by exit_mmap() and at the same time userfaultfd_writeprotect() is called. The race was detected by KASAN on a development kernel, but it appears to be possible on vanilla kernels as well. Use mmget_not_zero() to prevent the race as done in other userfaultfd operations.
In the Linux kernel, the following vulnerability has been resolved: ocfs2: fix race between searching chunks and release journal_head from buffer_head Encountered a race between ocfs2_test_bg_bit_allocatable() and jbd2_journal_put_journal_head() resulting in the below vmcore. PID: 106879 TASK: ffff880244ba9c00 CPU: 2 COMMAND: "loop3" Call trace: panic oops_end no_context __bad_area_nosemaphore bad_area_nosemaphore __do_page_fault do_page_fault page_fault [exception RIP: ocfs2_block_group_find_clear_bits+316] ocfs2_block_group_find_clear_bits [ocfs2] ocfs2_cluster_group_search [ocfs2] ocfs2_search_chain [ocfs2] ocfs2_claim_suballoc_bits [ocfs2] __ocfs2_claim_clusters [ocfs2] ocfs2_claim_clusters [ocfs2] ocfs2_local_alloc_slide_window [ocfs2] ocfs2_reserve_local_alloc_bits [ocfs2] ocfs2_reserve_clusters_with_limit [ocfs2] ocfs2_reserve_clusters [ocfs2] ocfs2_lock_refcount_allocators [ocfs2] ocfs2_make_clusters_writable [ocfs2] ocfs2_replace_cow [ocfs2] ocfs2_refcount_cow [ocfs2] ocfs2_file_write_iter [ocfs2] lo_rw_aio loop_queue_work kthread_worker_fn kthread ret_from_fork When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and released the jounal head from the buffer head. Needed to take bit lock for the bit 'BH_JournalHead' to fix this race.
In the Linux kernel, the following vulnerability has been resolved: udp: fix race between close() and udp_abort() Kaustubh reported and diagnosed a panic in udp_lib_lookup(). The root cause is udp_abort() racing with close(). Both racing functions acquire the socket lock, but udp{v6}_destroy_sock() release it before performing destructive actions. We can't easily extend the socket lock scope to avoid the race, instead use the SOCK_DEAD flag to prevent udp_abort from doing any action when the critical race happens. Diagnosed-and-tested-by: Kaustubh Pandey <kapandey@codeaurora.org>
In the Linux kernel, the following vulnerability has been resolved: f2fs: compress: fix race condition of overwrite vs truncate pos_fsstress testcase complains a panic as belew: ------------[ cut here ]------------ kernel BUG at fs/f2fs/compress.c:1082! invalid opcode: 0000 [#1] SMP PTI CPU: 4 PID: 2753477 Comm: kworker/u16:2 Tainted: G OE 5.12.0-rc1-custom #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: writeback wb_workfn (flush-252:16) RIP: 0010:prepare_compress_overwrite+0x4c0/0x760 [f2fs] Call Trace: f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs] f2fs_write_cache_pages+0x468/0x8a0 [f2fs] f2fs_write_data_pages+0x2a4/0x2f0 [f2fs] do_writepages+0x38/0xc0 __writeback_single_inode+0x44/0x2a0 writeback_sb_inodes+0x223/0x4d0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x1dd/0x290 wb_workfn+0x309/0x500 process_one_work+0x220/0x3c0 worker_thread+0x53/0x420 kthread+0x12f/0x150 ret_from_fork+0x22/0x30 The root cause is truncate() may race with overwrite as below, so that one reference count left in page can not guarantee the page attaching in mapping tree all the time, after truncation, later find_lock_page() may return NULL pointer. - prepare_compress_overwrite - f2fs_pagecache_get_page - unlock_page - f2fs_setattr - truncate_setsize - truncate_inode_page - delete_from_page_cache - find_lock_page Fix this by avoiding referencing updated page.
In the Linux kernel, the following vulnerability has been resolved: net/smc: fix kernel panic caused by race of smc_sock A crash occurs when smc_cdc_tx_handler() tries to access smc_sock but smc_release() has already freed it. [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88 [ 4570.696048] #PF: supervisor write access in kernel mode [ 4570.696728] #PF: error_code(0x0002) - not-present page [ 4570.697401] PGD 0 P4D 0 [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111 [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0 [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30 <...> [ 4570.711446] Call Trace: [ 4570.711746] <IRQ> [ 4570.711992] smc_cdc_tx_handler+0x41/0xc0 [ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560 [ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10 [ 4570.713489] tasklet_action_common.isra.17+0x66/0x140 [ 4570.714083] __do_softirq+0x123/0x2f4 [ 4570.714521] irq_exit_rcu+0xc4/0xf0 [ 4570.714934] common_interrupt+0xba/0xe0 Though smc_cdc_tx_handler() checked the existence of smc connection, smc_release() may have already dismissed and released the smc socket before smc_cdc_tx_handler() further visits it. smc_cdc_tx_handler() |smc_release() if (!conn) | | |smc_cdc_tx_dismiss_slots() | smc_cdc_tx_dismisser() | |sock_put(&smc->sk) <- last sock_put, | smc_sock freed bh_lock_sock(&smc->sk) (panic) | To make sure we won't receive any CDC messages after we free the smc_sock, add a refcount on the smc_connection for inflight CDC message(posted to the QP but haven't received related CQE), and don't release the smc_connection until all the inflight CDC messages haven been done, for both success or failed ones. Using refcount on CDC messages brings another problem: when the link is going to be destroyed, smcr_link_clear() will reset the QP, which then remove all the pending CQEs related to the QP in the CQ. To make sure all the CQEs will always come back so the refcount on the smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced by smc_ib_modify_qp_error(). And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we need to wait for all pending WQEs done, or we may encounter use-after- free when handling CQEs. For IB device removal routine, we need to wait for all the QPs on that device been destroyed before we can destroy CQs on the device, or the refcount on smc_connection won't reach 0 and smc_sock cannot be released.
In the Linux kernel, the following vulnerability has been resolved: gpio: aggregator: protect driver attr handlers against module unload Both new_device_store and delete_device_store touch module global resources (e.g. gpio_aggregator_lock). To prevent race conditions with module unload, a reference needs to be held. Add try_module_get() in these handlers. For new_device_store, this eliminates what appears to be the most dangerous scenario: if an id is allocated from gpio_aggregator_idr but platform_device_register has not yet been called or completed, a concurrent module unload could fail to unregister/delete the device, leaving behind a dangling platform device/GPIO forwarder. This can result in various issues. The following simple reproducer demonstrates these problems: #!/bin/bash while :; do # note: whether 'gpiochip0 0' exists or not does not matter. echo 'gpiochip0 0' > /sys/bus/platform/drivers/gpio-aggregator/new_device done & while :; do modprobe gpio-aggregator modprobe -r gpio-aggregator done & wait Starting with the following warning, several kinds of warnings will appear and the system may become unstable: ------------[ cut here ]------------ list_del corruption, ffff888103e2e980->next is LIST_POISON1 (dead000000000100) WARNING: CPU: 1 PID: 1327 at lib/list_debug.c:56 __list_del_entry_valid_or_report+0xa3/0x120 [...] RIP: 0010:__list_del_entry_valid_or_report+0xa3/0x120 [...] Call Trace: <TASK> ? __list_del_entry_valid_or_report+0xa3/0x120 ? __warn.cold+0x93/0xf2 ? __list_del_entry_valid_or_report+0xa3/0x120 ? report_bug+0xe6/0x170 ? __irq_work_queue_local+0x39/0xe0 ? handle_bug+0x58/0x90 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? __list_del_entry_valid_or_report+0xa3/0x120 gpiod_remove_lookup_table+0x22/0x60 new_device_store+0x315/0x350 [gpio_aggregator] kernfs_fop_write_iter+0x137/0x1f0 vfs_write+0x262/0x430 ksys_write+0x60/0xd0 do_syscall_64+0x6c/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] </TASK> ---[ end trace 0000000000000000 ]---
A pivot_root race condition in fs/namespace.c in the Linux kernel 4.4.x before 4.4.221, 4.9.x before 4.9.221, 4.14.x before 4.14.178, 4.19.x before 4.19.119, and 5.x before 5.3 allows local users to cause a denial of service (panic) by corrupting a mountpoint reference counter.
In the Linux kernel, the following vulnerability has been resolved: net_sched: prio: fix a race in prio_tune() Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer fires at the wrong time. The race is as follows: CPU 0 CPU 1 [1]: lock root [2]: qdisc_tree_flush_backlog() [3]: unlock root | | [5]: lock root | [6]: rehash | [7]: qdisc_tree_reduce_backlog() | [4]: qdisc_put() This can be abused to underflow a parent's qlen. Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog() should fix the race, because all packets will be purged from the qdisc before releasing the lock.