Published inByte-Sized AIOpenAI’s Custom AI Chips, Orion, and ChatGPT Search; Saudi Arabia’s $100B AI Initiative, and…AI News Key Headlines — 11/13/2024Nov 14Nov 14
Published inByte-Sized AIMicrosoft’s Multi-Agent AI Framework; Amazon Deepens Investment in Anthropic; Apple Pursues AI…Key Headlines — 11/10/2024Nov 11Nov 11
Published inByte-Sized AILLM Inference - Optimizing the KV Cache for High-Throughput, Long-Context Inference (ShadowKV)ShadowKV enables larger decoding batch sizes and higher throughput by freeing up GPU memory previously occupied by the KV cache.Nov 9Nov 9
Published inByte-Sized AIAnthropic’s Claude 3.5 Upgrades and API for Computer Navigation; Supermicro Faces Setbacks;AI Breif Headlines — 11/06/2024Nov 8Nov 8
Published inByte-Sized AILLM Inference - KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving, DejavuDistributed LLM serving is costly and often underutilizes hardware accelerators due to three main challenges:Nov 7Nov 7
Published inByte-Sized AIAccelerating Long Context Generation with KV Cache Offloading to CPU Memory, Using InfiniGenInfiniGen: An Efficient KV Cache Offloading and Sparse Attention TechniqueNov 6Nov 6
Published inByte-Sized AIPerplexity AI Nearing $500 Million Funding Round; Apple Facing Challenges in China; SK hynix and…AI News Brief Headlines — 2024/11/04Nov 6Nov 6
Published inByte-Sized AIOpenAI and Broadcom Developing AI Chips ; Huawei and SMIC Expanding AI HW Capabilities; Explosive…AI Brief Headlines — 10/31/2024Nov 3Nov 3
Published inByte-Sized AIOn-device AI — MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use CasesRunning LLMs on mobile devices presents significant challenges due to the resource constraints of mobile SoCs. In particular, the 6 to 12…Oct 25Oct 25
Published inByte-Sized AIOn-device AI —Efficient Large Language Model Deployment with Limited Memory Using Flash Storage…Apple LLM in a FlashOct 20Oct 20