Linux Kernel 3.10 Released
Linus Torvalds has announced the release of Linux Kernel 3.10:
So I delayed this by a day, considering whether to do another -rc, but decided that there wasn’t enough upside. Sure, it hasn’t been as quiet as I’d like, and we had this long discussion about an inode list locking scalability issue over the last week or two, but in the end that issue turned out to not be new, and while we may end up back-porting the eventual resolution to 3.10, it wasn’t a reason to delay the release.
Similarly, while I might wish for fewer pull requests during the late rc’s (and particularly the ones that came in Friday evening -inconvenient for a weekend release), at some point delaying things doesn’t really help things, and just makes the pent up demand for the next merge window worse.
In other words, I could really have gone either way, but decided that there wasn’t enough reason to break the normal pattern of “rc7 is the last rc before the release”. So here goes..
The appended changelog is (as usual) just the changes since the last rc. This time mainly from the networking pull (which includes drivers and core networking, as well as bluetooth), the rest were pretty small and scattered. We’ve got some arch updates, some acpi/pm fixes, and a scattering of other random fixes..In the bigger picture (ie since 3.9) this release has been pretty typical and not particularly prone to problems, despite my waffling about the exact release date. As usual, the bulk patch-wise is all drivers (pretty much exactly two thirds), while the rest is evenly split between arch updates and “misc”. No major new subsystems this
time around, although there are individual new features. As usual, I’m sure H-Online and kernelnewbies will do better writeups of the details..
Linux 3.9 brought more file systems enhancement for Btrfs, XFS and ext-4, included better LZO compression, improvement for power management, ARM SoC, and got rid off CONFIG_EXPERIMENTAL.
Linux 3.10 brings the following key changes:
- Timer free multitasking (Nearly tickless operation) – Up to now, Linux used preemptive multitasking where an hardware timer fires up at regular intervals (“ticks”), and can forcefully pause any program and run a OS routine that decides which task should continue running next.
This multitasking method may pose problems with CPUs of laptops and mobile devices which require inactivity to enter in low power modes. Since preemptive multitasking fires the the timer often (1000 times per second in a typical Linux kernel) even when the system is not doing anything, the CPUs could not save as much power as it was possible. Virtualization added even more problems, since each VM runs its own timer. This Linux release adds support for not firing the timer (tickless) even when tasks are running. It’s not actually fully tickless in this release, as the the timer only fires up one time per second. The full tickless mode is disabled when a CPU runs more than one process, and a CPU must be kept running with full ticks to allow other CPUs to go into tickless mode. You can read ‘(Nearly) full tickless operation in 3.10‘ and the Documentation for details.
- Bcache, a block layer cache for SSD caching – Bcache allows SSDs to cache other block devices, it does writeback caching (besides just write through caching), and is filesystem agnostic. By default it won’t cache sequential IO, just the random reads and writes. It can be used for desktops, servers, high-end storage arrays, and perhaps even embedded. For more details read the documentation or visit the wiki
- Btrfs: smaller extents – Btrfs has incorporated a new key type for metadata extent references which uses disk space more efficiently and reduces the size from 51 bytes to 33 bytes per extent reference for each tree block. In practice, this results in a 30-35% decrease in the size of the extent tree, which means less copy-on-write operations, larger parts of the extent tree stored in memory which makes heavy metadata operations go much faster. It can be enabled with mkfs or with btrfstune -x.
- XFS metadata checksums – Experimental implementation of metadata CRC32c checksums. These metadata checksums are part of a bigger project that aims to implement what the XFS developers have called “self-describing metadata“ which aims at solving verification scalability (fsck takes too long to verify petabyte scale filesystems with billions of inodes). This feature is experimental and requires using experimental xfsprogs. For more information, you can read the metadata Documentation.
- SysV IPC scalability improvements – Linux used to lock much too big ranges, and it used to have a single IPC lock per IPC semaphore array. Most loads never cared, but some did. This release splits out locking and adds per-semaphore locks for greater scalability of the IPC semaphore code. Micro benchmarks show improvements of more than 10x in some cases.
- rwsem locking scalability improvements -The rwsem (“read-writer semaphore”) locking scheme, used in many places in the Linux kernel, had performance problems because of strict, serialized, FIFO sequential write-ownership of the semaphore. In Linux 3.9, an “opportunistic lock stealing” patch was merged to fix it for the slow path, but in 3.10, opportunity lock stealing has been implemented in the fast path, improving the performance of pgbench with double digits in some cases.
- mutex locking scalability improvements – The mutex locking scheme, used widely in the Linux kernel, has been improved with some scalability improvements due to the use of less atomic operations and some queuing changes that reduce reduce cacheline contention.
- TCP optimization: Tail loss probe – This release adds the TCP Tail loss probe algorithm which aims at reducing tail latency of short transactions.
- ARM big.LITTLE support – Support for b.L processing has been added to 3.10. See commit.
- MIPS KVM support – KVM/MIPS supports MIPS32R2 and beyond. Read the release notes for details. See commit.
- tracing: tracing snapshots, stack tracing – The tracing framework has got the ability to allow several tracing buffers, which can be used to take snapshots of the main tracing buffer. These tracing snapshots can be triggered manually or with function probes. It’s also possible to cause a stack trace to be traced in the ring buffer when a given function is called.
Further details on Linux 3.10 are available on Kernelnewbies.org.