Published inByte-Sized AIA Use Case of Disaggregated Architecture for LLM Serving: MooncakeFeb 21Feb 21
Published inByte-Sized AI[DeepSeek Review] Exploring DeepSeek Architecture: Multi-head Latent Attention (MLA) and…Feb 2Feb 2
Published inByte-Sized AI[Memory] Ramulator 2.0: A Modern, Modular, and Extensible DRAM SimulatorRamulator 2.0 is a highly modular and extensible DRAM simulator designed to enable rapid and agile implementation and evaluation of design…Jan 29Jan 29
Published inByte-Sized AI[GPU] Accel-Sim: An Extensible Simulation Framework for Validated GPU ModelingAccel-Sim is a simulation framework designed to simplify modeling and validating future GPUs. It features a flexible frontend that switches…Jan 24Jan 24
Published inByte-Sized AI[Inference Compute Scaling ] Large Language Monkeys: Scaling Inference Compute with Repeated…[Inference Compute Scaling ] Large Language Monkeys: Scaling Inference Compute with Repeated SamplingJan 8Jan 8
Published inByte-Sized AI[AI Agents] Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsIn this blog post series on AI agents, we review the paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, a…Dec 31, 2024Dec 31, 2024
Published inByte-Sized AI[vLLM — Prefix KV Caching] vLLM’s Automatic Prefix Caching vs ChunkAttentionChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase PartitionDec 25, 2024Dec 25, 2024
Published inByte-Sized AIAMD MI300X vs. Nvidia H100/H200 — Training Performance ComparisonH100/H200 offers higher training performance at lower costs than MI300XDec 24, 2024Dec 24, 2024
Published inByte-Sized AIvLLM Joins the PyTorch EcosystemvLLM Joins the PyTorch Ecosystem and supports Amazon Rufus AI Shopping AssistantDec 21, 2024Dec 21, 2024