Published inByte-Sized AINVIDIA FFN Fusion: Parallelizing the Core of LLMs for 35× Cheaper Inference1d ago1d ago
Published inByte-Sized AINemotron-H: Advancing Inference Efficiency and Accuracy with Hybrid Mamba-Transformer ModelsNemotron-H is a new family of hybrid Mamba-Transformer large language models (LLMs) designed to enhance reasoning performance and inference…2d ago2d ago
Published inByte-Sized AIOpenAI Introduces 4o Image Generation: A Native, Autoregressive Alternative to Diffusion ModelsMar 28Mar 28
Published inByte-Sized AIChina’s AI Surge: DeepSeek Releases Upgraded Model, Ant Group Trained 300B LLM on Local ChipsAnt Group Cuts LLM Training Costs by 20% Using Domestic GPUsMar 28Mar 28
Published inByte-Sized AIHunyuan-T1’s Mamba Breakthrough, WEKA’s Storage-Level KV Offloading, and MoreAI Brief 3/23/2025Mar 24Mar 24
Published inByte-Sized AIA Use Case of Disaggregated Architecture for LLM Serving: MooncakeFeb 21Feb 21
Published inByte-Sized AI[DeepSeek Review] Exploring DeepSeek Architecture: Multi-head Latent Attention (MLA) and…Feb 2Feb 2
Published inByte-Sized AI[Memory] Ramulator 2.0: A Modern, Modular, and Extensible DRAM SimulatorRamulator 2.0 is a highly modular and extensible DRAM simulator designed to enable rapid and agile implementation and evaluation of design…Jan 29Jan 29
Published inByte-Sized AI[GPU] Accel-Sim: An Extensible Simulation Framework for Validated GPU ModelingAccel-Sim is a simulation framework designed to simplify modeling and validating future GPUs. It features a flexible frontend that switches…Jan 24Jan 24
Published inByte-Sized AI[Inference Compute Scaling ] Large Language Monkeys: Scaling Inference Compute with Repeated…[Inference Compute Scaling ] Large Language Monkeys: Scaling Inference Compute with Repeated SamplingJan 8Jan 8