Key Innovations from FAST ‘25 Full Papers
This report is generated by the OpenAI Deep Research.
Fast, Transparent Filesystem Microkernel Recovery with Ananke
Main Innovation
Introduces Ananke, a microkernel-based filesystem service that can transparently recover from crashes at the process level. Ananke leverages microkernel architecture to record critical state on process crashes, enabling lossless and fast recovery (on the order of milliseconds) without data loss.
Academic Significance
Demonstrates a new approach to filesystem fault tolerance by containing failures to a microservice and bridging the “state gap” between what’s in memory vs. on disk at crash time. It advances research on crash recovery by showing microkernel servers can restart quickly with minimal overhead and full state restoration, improving reliability in theory and practice.
Industry Significance
Provides a practical way to increase uptime for storage services. In data centers, a filesystem crash typically requires a full reboot or complex recovery; Ananke instead enables rapid automatic recovery of just the filesystem process, preserving open file states. This can lead to higher availability in storage appliances or OS kernels used in industry.
Boosting File Systems Elegantly: A Transparent NVM Write-Ahead Log for Disk File Systems
Main Innovation
Proposes NVLog, a write-ahead logging layer using byte-addressable Non-Volatile Memory (NVM) to speed up traditional disk filesystems. NVLog absorbs synchronous writes in NVM at fine granularity and defers or batches expensive disk writes, all transparently under existing filesystems. It introduces efficient log structures and crash consistency mechanisms to integrate NVM without migrating the entire FS to NVM.
Academic Significance
Bridges the gap between fast NVM and slower disks in a novel way. Unlike prior NVM-specialized filesystems, NVLog retrofits a conventional FS to exploit NVM’s speed without rewriting the whole stack. This offers a new research direction for heterogeneous storage: using a small NVM buffer to get near-NVM performance on legacy storage, while solving consistency across two media.
Industry Significance
Demonstrates significant performance boosts – disk filesystems accelerated by up to 15×, and outperforming even some native NVM file systems in certain scenarios. For storage vendors and systems like Linux ext4 or NTFS, this means they can dramatically improve throughput and fsync latency by adding an NVM module. It’s an elegant upgrade path for existing enterprise storage stacks using new memory tech.
DJFS: Directory-Granularity Filesystem Journaling for CMM-H SSDs
Main Innovation
Proposes DJFS, a journaling filesystem that operates at per-directory transaction granularity rather than a single global journal. By grouping file operations by directory, DJFS reduces contention and allows parallel journal commits on many-core systems. Key techniques include path-based transaction selection and coalescing, which let multiple directories’ updates be logged independently.
Academic Significance
Rethinks filesystem journaling in the era of multi-core CPUs and ultra-fast SSDs. DJFS addresses fundamental bottlenecks (lock contention, sequential commits) in classic journaling by isolating update streams per directory. This is a breakthrough in filesystem design, enabling significantly higher concurrency and making journaling scale with core count, an important step for modern OS research.
Industry Significance
Yields major performance gains on real workloads – e.g. 4.5× throughput in mailserver (Varmail) and ~3× in other benchmarks compared to the state-of-art ext4 journaling. For storage systems and OS vendors, DJFS means journaling no longer has to be a serialization point. This is especially relevant as NVMe SSDs and new memory buses (like CXL) make I/O much faster, shifting the bottleneck to software.
ScaleLFS: A Log-Structured File System with Scalable Garbage Collection for Commodity SSDs
Main Innovation
Introduces ScaleLFS, a log-structured file system that overhauls garbage collection (GC) for scalability on multi-core systems. It deploys per-core dedicated GC threads to parallelize the cleaning of log segments, a concurrent victim selection manager, and a page-level “victim protector” mechanism to avoid conflicts during GC. Together, these allow the LFS to perform GC without pausing the whole FS, maintaining high throughput.
Academic Significance
Tackles the classic LFS challenge – cleaning overhead – by using parallelism and fine-grained locking in new ways. ScaleLFS demonstrates that an LFS can scale with CPU cores, dramatically reducing the stalls typically caused by GC. This advances file system research by showing how to sustain LFS performance over time (no more huge drops during cleaning) and makes LFS viable on modern SSDs.
Industry Significance
Achieves up to 3.5×–7× higher sustained performance than Linux F2FS and other GC schemes. That means more consistent I/O rates and lower latency for applications even as disks fill up. For industry, this could improve user experience on devices using F2FS (Android phones) or any storage array using log-structured designs by simply improving how cleaning is done. It extends SSD lifespan and performance stability – both highly valued in enterprise storage.
Rethinking the Request-to-IO Transformation Process of File Systems for High-Bandwidth SSDs
Main Innovation
Proposes OrchFS, a heterogeneous I/O orchestration file system that splits write handling between SSD and NVM to fully saturate modern NVMe SSD bandwidth. It identifies three inefficiencies in current file systems (alignment overhead, page cache overhead, insufficient concurrency) and solves them by partitioning writes: aligned portions go directly to SSD, misaligned parts go to a small NVM buffer. OrchFS uses a multi-threaded, parallel I/O engine and a unified mapping structure to manage this transparently.
Academic Significance
Marks a shift in file system design to treat different storage tiers in parallel rather than just caching. It highlights that simply using faster SSDs isn’t enough — the OS must avoid legacy assumptions (like 4KB alignment and single-queue writes) that leave performance on the table. This work contributes a new architecture that combines memory and storage in tandem, significantly advancing how we think about OS-level I/O scheduling for ultra-fast devices.
Industry Significance
OrchFS achieves dramatic speedups – up to 29.7× faster writes and 6.8× faster reads compared to Linux ext4/F2FS or even state-of-the-art NVM/hybrid file systems. For practical systems, this means high-bandwidth SSDs (like PCIe 4.0/5.0 drives) can actually reach their potential. Data-intensive applications (databases, analytics) could see order-of-magnitude improvement by adopting similar techniques, without requiring completely new hardware.
FlacIO: Flat and Collective I/O for Container Image Service
Main Innovation
Develops FlacIO, an accelerator for container image distribution that introduces a “runtime image” abstraction. Instead of treating container images as monolithic layers to download, FlacIO represents the image as the in-memory state of a container’s root filesystem (the runtime image). It uses this to perform efficient, on-demand loading and sharing: a runtime page cache on each host serves pages to containers as needed, drastically reducing network transfer and I/O amplification.
Academic Significance
Provides a new approach to solving container cold-starts by shifting from a storage-centric to a memory-centric model. It identifies root causes of slow startups (I/O amplification, redundant transfers) and addresses them by reimagining image delivery as service-oriented caching rather than downloading entire images. This is a notable contribution to virtualization and storage research, bridging distributed storage and operating systems to eliminate inefficiencies in container orchestration.
Industry Significance
FlacIO shows impressive gains: container cold launch latency drops up to 23× vs full image pulls (and 4.6× vs existing lazy loading). In real-world scenarios (object storage gateways, ML training), it boosts throughput by ~2.25× over current solutions. These results mean cloud providers and Kubernetes environments can achieve much faster scaling and better utilization. Reducing image distribution time translates to cost savings (less network usage, faster deployments) and better user experiences (quick start of services).
Cloudscape: A Study of Storage Services in Modern Cloud Architectures
Main Innovation
Presents Cloudscape, a large-scale empirical study analyzing ~400 real-world cloud system architectures (from AWS) to understand how storage services are used. It mines publicly available architecture diagrams/videos to quantify service usage. Key findings include: S3 object storage is by far the most used storage (68% of systems), file storage is rare (4%), most architectures use multiple storage types together, and storage services often connect directly with serverless functions (Lambda) and VMs.
Academic Significance
This work provides a data-driven foundation for cloud storage research. It’s the first comprehensive survey of how practitioners compose cloud storage and compute services in the wild. Such insights help researchers focus on relevant problems (e.g., the dominance of object storage and heterogeneity suggests areas for optimization). Cloudscape essentially bridges the gap between academic assumptions and industry practice by showing what real designs look like at scale.
Industry Significance
For cloud providers and architects, Cloudscape’s analysis is a goldmine of best practices and common pitfalls. It confirms, for example, the central role of object storage (S3) in modern workloads, highlighting the need to optimize it, and shows that mixing storage types (object, database, analytics stores) is common. This can drive product decisions (such as improving integration between services, or creating new solutions for the rare use cases like cloud file systems). Essentially, it grounds the industry’s understanding of “typical” cloud architectures with hard data.
Maat: Analyzing and Optimizing Overcharge on Blockchain Storage
Main Innovation
Identifies and fixes “overcharge” inefficiencies in Ethereum’s storage fee model. Maat is a tool that detects when blockchain transactions pay more gas fees for storage operations than the actual resource usage warrants. Maat introduces techniques like fine-grained gas profiling (focused on high-level storage actions like account/contract writes) and consensus-consistent fee optimizations, meaning it can adjust fee calculations without breaking consensus across nodes.
Academic Significance
Bridges blockchain technology and systems optimization. Maat’s analysis revealed three major causes of fee overcharging in Ethereum clients and that a staggering ~92% of transactions on BSC (an Ethereum fork) were overpaying, totaling $11M excess fees. Academically, this is significant as it quantifies a previously undocumented inefficiency in a widely-used system and provides a concrete solution. It demonstrates a new kind of storage optimization – not about speed or space, but economic efficiency and fairness in resource accounting.
Industry Significance
By addressing these overcharges, Maat can potentially save blockchain users and smart contract developers huge costs (it can reduce overpaid fees by an estimated $5.6M per week on Ethereum-scale networks). For blockchain networks (Ethereum, Layer2s, etc.), adopting Maat’s optimizations means a more accurate fee market and possibly lower barrier for using on-chain storage. It could influence Ethereum improvement proposals for gas cost adjustments. Essentially, it’s a direct monetary impact: making blockchain storage more cost-effective and efficient.
Revisiting Network Coding for Warm Blob Storage
Main Innovation
Proposes NCBlob, a blob storage system that applies non-systematic network coding (MSR codes) to small and medium-sized objects to speed up repair times. Unlike traditional erasure-coded storage which keeps original data blocks (systematic), NCBlob stores only coded blocks and leverages a technique called All-at-Once random linear coding from network coding theory. This eliminates the need for reading many small sub-blocks scattered across drives during repairs, thereby significantly improving I/O efficiency when reconstructing lost data.
Academic Significance
Merges coding theory with practical storage system design. It challenges the assumption that systematic codes are always preferable by showing that for warm storage with lots of small blobs, non-systematic codes can reduce random I/O overhead in repairs. This is a fresh perspective in the erasure coding literature, demonstrating a scenario where a purely coded approach outperforms systematic codes in real systems. It advances the theory of MSR (minimum-storage regenerating) codes by validating their benefits (low repair bandwidth) in a deployed system context with small-object workloads.
Industry Significance
On a large cloud storage (Alibaba Cloud in experiments), NCBlob cut single-block repair time by 45% and full-node (entire disk) rebuild time by 38%, with negligible impact (~2%) on normal read throughput. This is highly attractive for providers of object storage: faster repairs mean higher data durability and availability (downtime/recovery windows shrink). It suggests that future distributed storage clusters can handle disk failures more gracefully, using less network and completing rebuilds faster – directly translating to cost savings and reliability improvements.
Mooncake: Trading More Storage for Less Computation – A KVCache-centric Architecture for Serving LLM Chatbot
Main Innovation
Describes MOONCAKE, a production LLM serving platform that uses a KVCache-centric disaggregated architecture to maximize throughput for chatbot workloads. It separates the inference pipeline into a prefill stage and a decode stage, running on different clusters, and introduces a global Key-Value cache accessible over the network. This design offloads and shares the transformer attention key/value tensors across requests, heavily utilizing CPU, DRAM, SSD, and NIC resources on GPU nodes to serve cached content instead of recomputing it. Essentially, it trades cheap storage (to cache intermediate results) to save expensive GPU compute time.
Academic Significance
Pioneers a new system architecture for large-scale AI serving, showing how classical ideas of caching and disaggregation can be applied to modern AI (LLM inference). It validates that splitting an AI service into specialized components (and using all available hardware resources, not just GPUs) yields significant performance benefits. This is a notable direction for systems research at the intersection of storage and AI: leveraging storage hierarchy (RAM/SSD) to alleviate computation bottlenecks in neural network serving.
Industry Significance
Mooncake is deployed at scale (thousands of nodes, >100 billion tokens/day) and boosted request capacity by 59%–498% on long-context queries compared to conventional setups. In production, it allowed their chatbot service “Kimi” to handle ~2× more requests on the same GPU hardware by meeting strict latency SLOs through caching. This is a breakthrough for any company serving LLMs (e.g., OpenAI, Microsoft, AI startups) because it drives down cost per query and improves user latency. It demonstrates that careful system engineering (not just model improvements) can yield order-of-magnitude gains in AI service scalability.
Towards High-Throughput and Low-Latency Billion-Scale Vector Search
Main Innovation
ntroduces FusionANN (as described in the paper), an approximate nearest neighbor search solution that cleverly splits work between CPU and GPU to handle billion-scale vector datasets with minimal hardware . The system uses a multi-tier index: an in-memory graph or inverted file on the CPU side prunes the search space, then only a small list of candidate vector IDs are sent to a GPU, which then fetches and re-ranks those vectors (from SSD if needed) with high precision. By minimizing data transfer and using each processor for what it does best (CPU for graph traversal, GPU for distance computation), it achieves both high throughput and low latency using just one entry-level GPU + SSD for billions of vectors.
Academic Significance
FusionANN is a significant step in data structure and search algorithm design, demonstrating that a hybrid CPU–GPU approach can beat purely GPU or purely CPU approaches for very large data. It tackles memory limitation issues (many ANN indices need terabytes of RAM) by using SSD storage and still maintains speed through careful collaboration and filtering. This pushes the boundary of what’s possible in approximate similarity search, enabling scalability to datasets that were previously infeasible without enormous clusters. It also highlights a design pattern for other systems: use CPU algorithms to reduce the work that a GPU needs to do, thereby overcoming I/O bottlenecks.
Industry Significance
High-quality vector search is crucial for applications like image retrieval, recommendation, and AI embedding search. FusionANN’s ability to run billion-scale ANN on modest hardware (one GPU, standard NVMe drives) is a cost-performance breakthrough. It lowers the total cost of ownership for deploying large vector databases, which is attractive for cloud services offering AI search. The paper notes it achieves high throughput and accuracy without the huge memory footprint of prior solutions. This means companies can provide vector search on massive data with lower infrastructure (no need for dozens of GPUs or expensive RAM-heavy machines), making advanced search more accessible and scalable.
IMPRESS: An Importance-Informed Multi-Tier Prefix KV Storage System
Main Innovation
Proposes IMPRESS, a multi-tier key-value store for caching prefix states in LLM inference (the keys and values from attention layers for prompt prefixes). Its key idea is to only retain and load the “important” portions of those prefix KV pairs when memory is insufficient. IMPRESS analyzes attention patterns and finds that many tokens across heads share similar “important token indices.” Using this, it identifies which parts of the prefix state contribute most to reducing latency, and it caches those in faster tiers (GPU or CPU memory), while less critical parts can reside on slower storage or be recomputed. The system includes an importance-aware caching policy and a dual-tier (RAM + disk) storage design that slashes the time-to-first-token for long prompts.
Academic Significance
This work is a novel blend of machine learning insight with systems design. It recognizes that not all data in an LLM prefix is equal for performance, introducing a quantitative definition of “importance” in the context of neural attention. This is an emerging direction: performance-aware ML systems, where one leverages model internals (like attention distributions) to inform caching. It contributes a methodology to compute and use importance across multi-level storage, which can be applied to other contexts of large-scale inference or even caching in databases (e.g., caching most impactful queries). It also extends research on tiered storage by incorporating semantic importance rather than just recency or frequency.
Industry Significance
IMPRESS demonstrates up to 2.8× faster response (TTFT) for LLM queries with long contexts, compared to state-of-the-art attention caching methods. For any real-time LLM application (chatbots, assistants with long conversation history, etc.), this directly translates to snappier responses and better user experience. Importantly, it achieves this without needing all data in expensive GPU or CPU memory – it smartly leverages disk when needed without hurting latency. This can reduce infrastructure costs for serving large-context models, as one can use smaller memory configurations but still get the benefit of caching. It’s an immediate win for cloud providers serving GPT-style models: support longer prompts and throughput without linear slowdown.
GPHash: An Efficient Hash Index for GPU with Byte-Granularity Persistent Memory
Main Innovation
Introduces GPHash, a hashing index designed specifically for a GPU+Persistent Memory (GPM) system, where a GPU can directly perform fine-grained reads/writes to a persistent memory (PM) store. GPHash is lock-free and warp-cooperative, meaning it aligns with GPU thread warps to do concurrent operations without traditional locks, thus avoiding warp divergence and under-utilization. It also provides lightweight crash consistency using atomic compare-and-swap (CAS) and metadata bits instead of heavy logging. To bridge GPU–PM speed mismatch, GPHash caches hot entries in the GPU’s own memory, minimizing expensive PM access
Academic Significance
This is a pioneering data structure for the emerging paradigm of GPUs operating on persistent storage directly. It addresses challenges unique to that paradigm (e.g., ensuring consistency across power loss on GPU writes, and coordinating threads in SIMT fashion) that traditional CPU indexing techniques don’t handle. GPHash pushes forward research on GPU databases and key-value stores, showing that with careful design, GPUs can do more than just computation – they can manage storage with high performance. It’s a big step towards GPU-accelerated storage systems, adding to academic knowledge on how to exploit massive parallelism for I/O-bound tasks.
Industry Significance
In benchmarks (YCSB, etc.), GPHash outperformed existing CPU-based or naive GPU indexing by up to 27.6×. This implies a huge opportunity for industries that need real-time big data processing: for instance, a GPU could maintain a very large hash table (billions of entries) in persistent memory and serve queries/orders of magnitude faster than CPU solutions. Real-world scenarios like online recommendation systems (with large embedding tables) or analytics could use GPHash to cut latency drastically. It also suggests future hardware/software co-design: if vendors provide GPUs with direct PM access, software like GPHash can leverage it for ultra-fast databases or caching systems, potentially revolutionizing high-performance OLAP/OLTP solutions.
Archer: Adaptive Memory Compression with Page-Association-Rule Awareness
Main Innovation
Proposes Archer, a memory compression framework for mobile OSes (Android/Linux) that breaks the conventional page-by-page compression strategy. Archer observes that ~25% of memory pages have implicit correlations (association rules) with each other. It uses this insight to adapt compression granularity: instead of always compressing each 4KB page independently, Archer sometimes compresses groups of associated pages together or leaves certain pages uncompressed if their partners are in use. It includes a redesigned LRU replacement that factors in these page associations and an adaptive “compression region” that can dynamically switch between page-wise and group-wise compression.
Academic Significance
This work brings techniques from data mining (association rule mining) into OS memory management – an interdisciplinary leap. It challenges the long-held assumption that paging must be uniform, showing that variable-granularity compression can outperform fixed 4KB compression in terms of speed. For the academic community, Archer opens a new line of research into semantic- or relationship-aware memory management (treating memory not just as anonymous pages, but considering their relationships). It also contributes to mobile systems research by targeting user-facing performance (app launch, frame rates) and demonstrating a substantial improvement via OS-level innovations.
Industry Significance
Archer yielded concrete user-perceivable improvements: app launch times sped up by 1.55×, camera shot-to-shot time by 1.42×, and graphics frame rates by 1.31× on average compared to the best current Android compression system. For smartphone makers and OS developers, this means devices can feel significantly faster without adding hardware, simply by smarter software. Given that memory is a tight resource on mobiles, Archer’s ability to “create” more memory space while also boosting speed (traditionally compression trades speed for space) is extremely valuable. It could prolong the life of older devices or reduce the need to pack in more RAM, which has cost and energy benefits.
VectorCDC: Accelerating Data Deduplication with Vector Instructions
Main Innovation
VectorCDC leverages modern CPU SIMD instructions to speed up content-defined chunking and fingerprinting in data deduplication. It offloads the computationally heavy parts of deduplication – like rolling hash calculations for variable-size chunk boundaries, and chunk fingerprint comparisons – to wide vector registers, processing multiple bytes or hashes in parallel. The design likely involves tailoring the chunking algorithm (e.g., Rabin fingerprint or similar) to be SIMD-friendly, and processing multiple streams of data in one vector operation. By doing so, it achieves much higher throughput in identifying duplicate content segments without sacrificing chunking quality.
Academic Significance
Deduplication has been traditionally CPU-bound; VectorCDC is among the first to apply data-parallel CPU capabilities to this problem in a general way. It advances the state of the art in storage efficiency by demonstrating that algorithmic improvements (making them SIMD-aware) can keep up with rising storage volumes. This work also contributes to the broader theme of hardware-software co-design: it adapts dedup algorithms to fit the underlying hardware (vector units), showing how low-level optimization can yield system-level gains. It may open doors for applying similar SIMD acceleration to other storage algorithms (like compression, RAID parity calculations, etc.), validating that even “classic” algorithms benefit from revisiting under the lens of modern CPU architecture.
Industry Significance
By accelerating the deduplication process, VectorCDC allows storage systems to eliminate redundant data with minimal performance penalty. This means backup systems, archival storage, or primary storage with inline deduplication can handle higher data rates (e.g., faster backups, real-time dedup on faster networks) without needing specialized hardware. If, for example, VectorCDC doubles or triples dedup throughput on common servers (the paper likely reports significant speedups under SIMD), data centers can reduce storage usage more aggressively and in more scenarios. This translates to cost savings on disk capacity and bandwidth. It also extends the feasible use of deduplication to edge and client devices (which have SIMD-capable CPUs) by mitigating the CPU load.
Oasis: An Out-of-core Approximate Graph System via All-Distances Sketches
Main Innovation
Oasis is a graph processing system that enables analyzing massive graphs out-of-core (on disk) by using All-Distances Sketches (ADS) for approximation. ADS is a probabilistic summary of graph neighborhoods that can estimate distances or connectivity without traversing the full graph. Oasis builds these sketches efficiently for graphs larger than memory and uses them to answer queries (like centrality, similarity, reachability) approximately but much faster than exact methods. Key innovations include new algorithms to construct ADS from disk by streaming through graph data and techniques to reduce I/O (e.g., compressing or partitioning sketches) so that even with limited RAM, the system can process very large graphs.
Academic Significance
Integrates algorithmic theory (ADS from theoretical computer science) with systems (out-of-core graph processing). Graph algorithms often struggle once data exceeds memory; Oasis shows a path forward by trading off a bounded accuracy loss for huge gains in scalability and speed. It contributes to approximation algorithms applied in practice, demonstrating that carefully designed sketches can maintain high accuracy for real-world graph analytics while cutting down resource usage. This is an emerging direction for big data: approximate computing as a first-class citizen in system design. For the research community, Oasis provides evidence that one can process graphs with billions of edges on a single machine, which was previously nearly impossible, thereby expanding what problems are tractable.
Industry Significance
Many industry problems involve giant graphs (social networks, web links, recommendation graphs). Oasis, by enabling analysis on graphs at the enormous (disk-sized) scale with moderate hardware, means companies could extract insights from data previously too large to handle without a cluster. The approximate nature still yields meaningful answers with known error bounds, which is often acceptable in analytics. This lowers cost: instead of huge in-memory clusters, a single server with SSDs might do the job. It also can drastically speed up analytics; for example, calculating user influence or shortest-path based metrics in a social graph could be done in hours instead of days. This opens opportunities for more timely analytics and perhaps real-time features based on graph metrics.
PolyStore: Exploiting Combined Capabilities of Heterogeneous Storage
Main Innovation
PolyStore is a new storage architecture that abandons the traditional caching/tiering hierarchy and instead uses multiple storage devices in parallel (“horizontal” storage). It adds a meta-layer that spans OS and user space, allowing fine-grained data placement across, say, an NVMe SSD, a SATA SSD, and persistent memory concurrently. Rather than always write to the fastest device and migrate data down, PolyStore can, for example, stripe or distribute I/O across all devices to use their combined bandwidth. It maintains a dual-layer translation table and ensures that consistency, security, and sharing semantics are preserved while bypassing the bottlenecks of a single-tier approach. Essentially, it “flattened” the storage hierarchy so all media contribute to performance.
Academic Significance
Challenges a core assumption in storage systems that a hierarchy (fast cache above slow storage) is optimal. PolyStore’s evaluation shows that this assumption breaks down with modern devices – e.g., you can waste write bandwidth by only using the top tier. By successfully designing a system that treats heterogeneous storage as teammates rather than master-slave, it provides a new blueprint for storage research. It also had to solve non-trivial issues: ensuring data coherence and avoiding contention when multiple devices are active. This contributes knowledge on how to coordinate different filesystems and device drivers in unison. PolyStore’s approach could spark a rethinking of I/O scheduling and data management in both operating systems and database systems to leverage all available hardware parallelism.
Industry Significance
PolyStore achieved between 1.1× up to 9.4× better performance on micro-benchmarks and ~1.5–2× on real applications, compared to using a hierarchy on the same devices. This means a system with, for instance, an Optane SSD and a NAND SSD could see nearly double overall throughput by using both fully, rather than one as cache. For enterprise IT, it offers better ROI on hardware – every device’s full bandwidth is utilized. It could be especially beneficial in scenarios like databases or analytics pipelines where there are distinct I/O streams (some random, some sequential) that could be split across devices. Industry storage solutions could adopt PolyStore’s meta-layer to differentiate their software, promising customers that adding a mix of storage types will immediately add performance (not just latency improvements).
Liquid-State Drive: A Case for DNA Block Device for Enormous Data
Main Innovation
Proposes the concept of a “Liquid-State Drive” (LiqSD), which is essentially a DNA-based block storage device augmented by conventional SSDs. Because synthetic DNA storage has extreme density but high latency and write-once characteristics, LiqSD introduces a dual-layer design: a small fast SSD holds a lightweight mapping table and buffers recent writes, while bulk data is stored in DNA archives. A dual translation layer (one in SSD for quick lookup, one for the DNA media) allows the system to present a standard block interface to the OS. Innovations include methods to reduce metadata updates (so you don’t frequently rewrite DNA), techniques for grouping writes to amortize the cost of DNA synthesis, and error-correction schemes tuned to DNA’s error patterns – all packaged to behave like a very high-latency, high-capacity disk.
Academic Significance
This is a seminal effort in making DNA data storage practically accessible to computer systems. It takes DNA storage out of the lab realm of offline batch processing and imagines it as part of an online storage hierarchy. The academic contribution lies in identifying and solving systems-level challenges: how to handle address translation when DNA writes are immutable, how to integrate a device with hours or days latency into an OS expecting millisecond responses (through caching and prediction), and how to maintain reliability. It lays groundwork for “biological storage systems” in computer science, extending storage research to include entirely new media. This paper is likely one of the first to outline how a DNA storage device could plug into an OS, which is a significant interdisciplinary breakthrough.
Industry Significance
Although DNA storage is still experimental, LiqSD is forward-looking for archival and cold storage solutions. In industry terms, it sketches a path to an archival store that could hold exabytes in a small physical footprint at low cost (once DNA synthesis/sequencing costs drop). By showing a block device interface, it means future data centers could potentially use DNA drives in place of tape libraries or cold data HDDs, without changing applications. The use of an SSD as a front-end means performance-critical data is served normally, while older or less-used data migrates to DNA in the background – an approach industry could gradually adopt as DNA tech matures. If realized, this could revolutionize long-term data retention (for example, an entire data archive could be stored in a few cc’s of liquid).
DNA Data Storage: A Generative Tool for Motif-based DNA Storage
Main Innovation
Introduces a generative design tool for DNA storage sequences that uses motif-based rules to optimize DNA strands for data encoding. In DNA storage, not all sequences are equal – some patterns (motifs) can cause biochemical errors (like secondary structures or PCR bias). This tool uses a generative algorithm (potentially AI or combinatorial search) to produce candidate DNA coding sequences that avoid undesirable motifs and satisfy constraints like GC content, homopolymer run limits, and error-correcting code requirements. Essentially, it automates the creation of DNA codes that are tailor-made for robustness and efficiency, rather than using random or purely heuristic-designed sequences.
Academic Significance
This work sits at the intersection of coding theory, synthetic biology, and storage systems. It provides a systematic way to generate encoding schemes for DNA storage, whereas prior efforts often relied on manually crafted rules or adapting traditional error-correcting codes. By focusing on motifs, it acknowledges and addresses the domain-specific challenges of DNA (like certain substrings causing issues in synthesis or sequencing). Academically, it contributes a methodology to ensure reliability of DNA storage before experimental tests – essentially, in-silico verification of sequences. This reduces trial-and-error in DNA storage research and pushes it closer to practical viability by ensuring data encodings are biologically sound and efficient.
Industry Significance
As DNA storage approaches practical use, having a generative tool means companies can quickly adapt DNA storage to new use cases or devices. For example, if a new sequencing machine has trouble with certain motifs, this tool could generate a new set of DNA encodings that avoid those, without overhauling the whole system. It also potentially improves data density and accuracy: better sequences mean fewer errors and retries, which lowers cost. In the long run, if DNA storage is commercialized, such a tool would be part of the “compiler” or middleware that takes user data and encodes it into DNA. It ensures high reliability and longevity of stored data by embedding biochemical resilience directly into the data encoding stage.
GeminiFS: A Companion File System for GPUs
Main Innovation
The core innovation is GeminiFS, a “companion” file system that enables GPUs to access files directly from NVMe storage with high performance while leveraging the host file system for metadata management. Unlike existing GPU-centric storage solutions (like BaM) which lack file abstractions, or CPU-centric ones which suffer from synchronization overhead, GeminiFS embeds file metadata directly into files on the host side. This design allows the GPU to retrieve metadata and access data directly via NVMe queues without CPU intervention for the data path. Additionally, it introduces a parallel control plane mechanism by extending the NVMe driver to allow both CPU and GPU to manage storage queues concurrently, along with a software-defined, GPU-friendly page cache to maximize internal GPU bandwidth.
Academic Significance
GeminiFS addresses the critical gap between raw storage access performance and the need for high-level file abstractions in GPU computing. It advances the state of the art by demonstrating how to decouple metadata management (handled by the CPU) from data access (handled by the GPU) in a shared storage environment, effectively overcoming the traditional trade-off between functionality and performance. This work opens new research directions for “companion” architectures where accelerators like GPUs can operate as semi-autonomous I/O entities, potentially influencing future designs of heterogeneous operating systems and storage stacks for data-intensive AI/ML workloads.
Industry Significance
For industry, GeminiFS offers a practical solution to the memory capacity bottleneck in large-scale ML applications (such as LLMs and GNNs) by enabling efficient, file-based storage expansion. By permitting direct, high-throughput file access from GPUs, it significantly improves the performance of I/O-intensive training and inference tasks compared to current state-of-the-art solutions. This technology could be integrated into AI infrastructure and cloud platforms to allow larger models to run on fewer GPUs or with cheaper storage-based memory expansion, directly improving cost-efficiency and scalability for commercial AI deployments.
Trends and Emerging Directions in FAST ‘25 Innovations
Collectively, the FAST ‘25 papers highlight several notable trends and breakthroughs in file and storage technologies:
Harnessing New Hardware Paradigms
Many papers rethink storage design in light of emerging hardware. For example, the integration of byte-addressable NVM into disk filesystems (NVLog), CXL and multi-core influences on journaling (DJFS), GPUs directly managing persistent memory (GPHash), and even DNA as storage media (Liquid-State Drive). This reflects a broad direction: storage systems are being redesigned to exploit hardware parallelism (GPUs, many-core), low-latency memory, and extreme-density media.
Blurred Boundaries Between Storage and Computation (AI/ML)
A number of works combine storage system thinking with AI/ML workloads. Mooncake and IMPRESS address how to serve Large Language Models more efficiently by using storage (caches, multi-tier memory) to offload computation. FusionANN splits vector search across CPU and SSD/GPU to handle huge datasets. This indicates an emerging direction where storage hierarchy is leveraged to accelerate AI – essentially treating model states, embeddings, and caches as a new kind of “data” to be optimized by storage techniques.
Parallelism and Concurrency in Filesystems
Several innovations (Ananke, DJFS, ScaleLFS, PolyStore) focus on eliminating serialization and fully utilizing multi-core and multi-device environments. Whether it’s parallel crash recovery, concurrent journaling, parallel log cleaning, or striping across heterogeneous devices, the trend is scaling file/storage performance by concurrency rather than solely faster hardware. This is crucial as core counts and device counts rise.
Cloud-Native Storage Insights and Optimizations
The influence of cloud workloads is evident. Cloudscape provides a reality check on cloud storage usage (heavy object storage and serverless integration), guiding researchers to target relevant problems. FlacIO tackles container deployment speed – a key cloud operations issue – by redesigning image distribution. The emphasis is on lowering latency and overhead in distributed storage services, aligning with industry’s push for agility (fast scaling, function-as-a-service, etc.).
Data Reduction and Efficiency
From VectorCDC speeding up deduplication to NCBlob reducing repair bandwidth with network coding, there’s a theme of doing more with less. Storage systems are incorporating advanced coding (erasure codes, generative DNA codes) and algorithmic optimizations (SIMD chunking, ADS sketches) to save space or time. This signals that as data volumes explode, storage efficiency techniques (dedup, compression, coding) remain a hotbed of innovation, now often accelerated by hardware or smarter algorithms.
Emergence of Approximate and Adaptive Methods
A subtle but important trend is the acceptance of approximation and adaptivity in storage systems. Oasis willingly trades exact answers for speed on huge graphs. IMPRESS decides to cache “important” data rather than everything. Archer compresses memory selectively based on usage patterns. This reflects a maturing perspective that storage systems can be intelligent, not just exact bit-stores – they can leverage workload patterns or allow slight approximations for big gains in performance or capacity.
Summary
In summary, FAST’25 shows a landscape where storage technology is more heterogenous and integrated than ever. Researchers are breaking traditional layers: embedding new tech like NVM, GPUs, DNA into the stack, fusing storage with computation for AI, and using global knowledge (of workloads or system state) to manage data more smartly. The breakthroughs point toward storage systems that are highly adaptable, extremely scalable, and deeply optimized for the hardware and workloads of the coming decade.