Don Moon – Medium

Don Moon

Published in
Byte-Sized AI

Demystifying NVIDIA Dynamo: A High-Performance Inference Framework for Scalable GenAI

At GTC 2025, NVIDIA unveiled NVIDIA Dynamo, a high-throughput, low-latency open-source inference serving framework purpose-built for…

13h ago

Demystifying NVIDIA Dynamo: A High-Performance Inference Framework for Scalable GenAI

13h ago

Published in
Byte-Sized AI

NVIDIA FFN Fusion: Parallelizing the Core of LLMs for 35× Cheaper Inference

2d ago

NVIDIA FFN Fusion: Parallelizing the Core of LLMs for 35× Cheaper Inference

2d ago

Published in
Byte-Sized AI

Nemotron-H: Advancing Inference Efficiency and Accuracy with Hybrid Mamba-Transformer Models

Nemotron-H is a new family of hybrid Mamba-Transformer large language models (LLMs) designed to enhance reasoning performance and inference…

4d ago

Nemotron-H: Advancing Inference Efficiency and Accuracy with Hybrid Mamba-Transformer Models

4d ago

Published in
Byte-Sized AI

OpenAI Introduces 4o Image Generation: A Native, Autoregressive Alternative to Diffusion Models

Mar 28

OpenAI Introduces 4o Image Generation: A Native, Autoregressive Alternative to Diffusion Models

Mar 28

Published in
Byte-Sized AI

China’s AI Surge: DeepSeek Releases Upgraded Model, Ant Group Trained 300B LLM on Local Chips

Ant Group Cuts LLM Training Costs by 20% Using Domestic GPUs

Mar 28

China’s AI Surge: DeepSeek Releases Upgraded Model, Ant Group Trained 300B LLM on Local Chips

Mar 28

Published in
Byte-Sized AI

Hunyuan-T1’s Mamba Breakthrough, WEKA’s Storage-Level KV Offloading, and More

AI Brief 3/23/2025

Mar 24

Hunyuan-T1’s Mamba Breakthrough, WEKA’s Storage-Level KV Offloading, and More

Mar 24

Published in
Byte-Sized AI

A Use Case of Disaggregated Architecture for LLM Serving: Mooncake

Feb 21

A Use Case of Disaggregated Architecture for LLM Serving: Mooncake

Feb 21

Published in
Byte-Sized AI

[DeepSeek Review] Exploring DeepSeek Architecture: Multi-head Latent Attention (MLA) and…

Feb 2

[DeepSeek Review] Exploring DeepSeek Architecture: Multi-head Latent Attention (MLA) and…

Feb 2

Published in
Byte-Sized AI

[Memory] Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator

Ramulator 2.0 is a highly modular and extensible DRAM simulator designed to enable rapid and agile implementation and evaluation of design…

Jan 29

[Memory] Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator

Jan 29

Published in
Byte-Sized AI

[GPU] Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling

Accel-Sim is a simulation framework designed to simplify modeling and validating future GPUs. It features a flexible frontend that switches…

Jan 24

[GPU] Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling

Jan 24

Don Moon

Don Moon

Friend of Medium

www.linkedin.com/in/dongukmoon/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech