AI News - 2026-05-29

221 articles from 27 sources, generated 2026-05-29 20:21:46

TOP 10

Anthropic raises $65B in Series H funding at $965B post-money valuation

Anthropic NewsProduct

Anthropic raises $65B in Series H funding at $965B post-money valuation

Introducing Claude Opus 4.8

Anthropic NewsProduct

Introducing Claude Opus 4.8

Introducing dynamic workflows in Claude Code

Claude BlogProduct

Introducing dynamic workflows in Claude Code

Supercharge your integration workflow with the Google Pay & Wallet Developer MCP server

Google AI BlogProduct

<img src="https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/pay-wallet-dev-mcp-thumb.2e16d0ba.fill-800x400.jpg" alt="Featured image" /> Google has announced the new Google Pay & Wallet Developer MCP server, an open-standard tool designed to securely

How the community trained Gemma to "Think" with Tunix and TPUs

Google AI BlogProduct

<img src="https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/Building-1-banner_Tg8sqqU.2e16d0ba.fill-800x400.png" alt="Featured image" /> The Google Tunix Hackathon on Kaggle challenged developers to transform small, non-reasoning base models into genera

OpenJarvis: a local-first personal AI is now available to run with Ollama

Ollama BlogProduct

OpenJarvis v1.0 is now available: an open-source framework for building personal AI agents that run on your own hardware, with Ollama support built-in.

Strengthening societal resilience with Rosalind Biodefense

OpenAI BlogProduct

OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI.

How Endava builds an agentic organization with Codex

OpenAI BlogProduct

Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.

OpenAI’s Frontier Governance Framework

OpenAI BlogProduct

Explore OpenAI’s Frontier Governance Framework and how our AI safety, security, and risk practices align with emerging EU and California regulations.

MUFG aims to become AI-native with OpenAI

OpenAI BlogProduct

MUFG uses ChatGPT Enterprise to build an AI-native organization, improve workflows, and deliver new AI-powered financial services at scale.

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Hugging Face BlogProduct

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

How the Pope’s Magnifica Humanitas offers a template for individuals to meet the AI moment

MIT Technology Review AIIndustry

Pope Leo XIV’s new encyclical on artificial intelligence includes a statement that warrants serious attention from technologists and policymakers: “Technology is never neutral.” Magnifica Humanitas (“Magnificent Humanity”) is a clarion call to all people to act with courage and solidarity as we ente

The AI Hype Index: AI gets booed in graduation season

MIT Technology Review AIIndustry

It is one thing to say AI will change the world. It is another to expect the class of 2026 to applaud it. In fact, when former Google CEO Eric Schmidt told University of Arizona graduates that their task is to help shape AI, he was met with a resounding chorus of boos. “I can…

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

Ars Technica AIIndustry

The controversy over vibe coding reached a new high this week after a developer added hidden instructions to his open source Java testing app to sabotage projects performed by AI coding agents. The instructions were added to jqwik, a test engine for JUnit 5, a platform for testing Java virtual machi

Websites have a new way to spy on visitors: Analyzing their SSD activity

Ars Technica AIIndustry

Over the decades, there has been no shortage of sites using clever techniques to covertly track visitors’ browsing histories, device fingerprints, and keystrokes and mouse movements in real time. Even Meta and Yandex were recently caught joining in the privacy-invasive free-for-all. Now sites have a

Here Comes Ojai, Waymo’s New Chinese-Made Robotaxi

Wired AIIndustry

The pale-blue Ojai vehicles will start picking up members of the public in California and Arizona in the next few weeks.

New Moms Are Returning to Coding Jobs Radically Reshaped by AI

Wired AIIndustry

New mothers working in software development are staring down an AI-pilled workplace they barely recognize.

Amazon Thinks the Future of Data Centers Depends on a Technical Problem It Just Solved

Wired AIIndustry

The tech giant says a breakthrough in data center networking has dramatically accelerated the flow of information through its massive cloud infrastructure.

Illinois Lawmakers Just Passed America’s Strongest AI Safety Bill

Wired AIIndustry

The bill requires companies like OpenAI, Anthropic, and Google to have third parties confirm they’re following safety standards. Illinois governor JB Pritzker says he’ll sign it.

Huawei's ‘Chip Queen’ Throws Down the Gauntlet

Wired AIIndustry

The Chinese company is adapting to the demise of Moore’s Law, which guides chip production. It could complicate US chip dominance.

Adobe’s conversational AI agent is a mediocre design intern

The Verge AIIndustry

AI image tools rarely make me feel like I'm part of the creative process. They are, after all, mostly designed so that people with no design experience can type in a few words and get back a usable result. So I was pleasantly surprised by Adobe's latest take on an AI image assistant: It's a […

Microsoft 365 Copilot gets a speed boost and cleaner design

The Verge AIIndustry

Microsoft is launching a revamped version of Microsoft 365 Copilot, offering a cleaner design that the company claims loads twice as fast. As part of this update, Copilot will provide more reliable and structured responses that are easier to scan, according to Microsoft. The redesign, which is rolli

Claude’s new model is more ‘honest’ when it messes up

The Verge AIIndustry

Anthropic is releasing Claude Opus 4.8 on Thursday, and the company is touting the model's "honesty." According to Anthropic, it trains "all [its] models to be honest - for instance, to avoid making claims that they can't support." But it notes that "a general problem with AI models is that they som

A $2,000 AI-generated film will make its debut at Tribeca

The Verge AIIndustry

Next month's Tribeca Festival will include the premiere of an AI-generated film: Dreams of Violets. The 75-minute film is a fictional dramatization of the Iranian government's mass killing of protestors in January, with the people and images fully created by AI, as reported earlier by The Hollywood

YouTube takes baby steps to being a real podcast app

The Verge AIIndustry

New features coming to YouTube could make it better for listening to podcasts, rolling out to Premium subscribers starting today on Android and coming later to iOS. A new "on-the-go mode" shifts YouTube into an audio-first layout, with larger, simplified playback buttons, a still image in place of t

These new iOS 27 renders hint at Siri’s big redesign

The Verge AIIndustry

Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass. Renders from Bloomberg offer a preview of iOS 27, including the new app and chat interface for Siri. The renders are "based on information viewed by Bloomberg and people wit

CNN sues Perplexity over ‘verbatim’ copycat articles

The Verge AIIndustry

CNN has filed a lawsuit against Perplexity, claiming that the startup's AI tools generate "verbatim" copies of its work, as reported earlier by CNN. The lawsuit, filed in a New York court on Thursday, also alleges that Perplexity provides users with information locked behind CNN's subscription. Perp

Rivian’s software chief thinks you don’t need CarPlay or buttons

The Verge AIIndustry

Today, I’m talking with Wassym Bensaid, the chief software officer at Rivian, and the co-CEO of Rivian’s platform joint venture with Volkswagen, which everyone just calls RV Tech. That joint venture kicked off about a year and a half ago with a nearly $6 billion investment from Volkswagen. It effect

YouTube will let you ask AI to make a custom video feed

The Verge AIIndustry

YouTube is launching a new AI feature that creates a personalized video feed based on descriptions of what you want to watch. In its announcement, YouTube says custom content feeds can be built around your specific interests, moods, or favorite topics, which you can then pin to the top of your YouTu

😺 Claude Opus 4.8 got safer today

The NeuronIndustry

😺 Claude Opus 4.8 got safer today

AI Agents and Automation for Beginners: Full Livestream Guide with Timestamps

The NeuronIndustry

We walked through what AI agents are, how automations work, what confusing terms like API, webhook, JSON, and MCP mean, and how tools like ClickUp, Make, Zapier, n8n, Codex, and Cloudflare Workers fit together.

Everything That Happened in AI Today (Wednesday, May 27, 2026)

The NeuronIndustry

Robinhood opened trading and virtual cards to AI agents; AxiomProver produced peer-reviewed machine-verified math proofs; OpenAI and Thrive built self-improving tax agents; Google launched AI Threat Defense; Amazon MGM greenlit GenAI-funded series; plus much more.

😺 LIVE now: Agents for total beginners

The NeuronIndustry

😺 LIVE now: Agents for total beginners

😼 Robinhood gave AI agents wallets

The NeuronIndustry

😼 Robinhood gave AI agents wallets

😺 🎙️ Watch: Is Brain-like Computing What's Next?

The NeuronIndustry

😺 🎙️ Watch: Is Brain-like Computing What's Next?

苹果 iOS 27 曝光：Siri 将变身独立聊天 App，支持多模型接入

RadarAIIndustry

📌 一句话摘要曝光信息显示，苹果 iOS 27 将对 Siri 进行大改造，使其成为类似 ChatGPT 的独立聊天 App，并允许接入 ChatGPT、Gemini、Claude 等多种 AI 模型。 📝 详细摘要这条推文曝光了苹果 iOS 27 中 Siri 的重大改造计划。核心信息包括：Siri 将变成一个独立的聊天 App，界面类似 ChatGPT，并会入驻灵动岛；苹果将允许 Siri 接入 ChatGPT...

Grok Build CLI 实测：可生成图片但视频和读帖功能不可用

RadarAIIndustry

📌 一句话摘要作者实测 Grok Build CLI，发现可生成图片但视频生成和读取 X 帖子功能暂不可用，并指出编程能力不及 Codex 和 Cursor。 📝 详细摘要该推文分享了作者对 Grok Build CLI 的实测体验。通过提供的安装命令，作者验证了 CLI 工具可以生成图片，但调用 video_gen 接口生成视频的功能以及直接读取 X 上帖子的功能均未能正常工作。作者还对比了 Grok 的编程能力...

Agent 编排税：并行任务越多，认知成本越高

RadarAIIndustry

📌 一句话摘要独立开发者分享经验：启动多个 Agent 并行工作会导致注意力分散和上下文切换成本剧增，建议一次只运行 1-2 个任务并认真 Review 代码。 📝 详细摘要这篇推文引用了一篇关于 Agent 编排成本的文章，作者结合自身实践提出了「编排税」的概念。核心观点是：人的注意力是单线程的，无法真正并行处理多个 Agent 的输出。启动的 Agent 越多，最终需要人工判断和合并代码的成本就越高。作者建议将...

面壁智能「开源周」：一场定义端侧 AI 终局的系统性「亮剑」

RadarAIIndustry

📌 一句话摘要面壁智能通过开源周系统性发布五项端侧 AI 技术成果，展示了其在数据、算法、框架、应用全链路的系统性工程能力，定义了端侧 AI 终局的竞争格局。 📝 详细摘要本文深度报道了面壁智能联合 OpenBMB 开源社区于 2026 年 5 月 25 日至 29 日举办的「端侧大模型开源周」。文章指出，面壁智能在五天内连续发布了五项关键技术：1.58-bit 低比特训练大模型 BitCPM-CANN、性能超越两...

从定制 Workflow 到 AI 自主决策的架构演进——以 TMIC AI 小新为例

RadarAIIndustry

📌 一句话摘要本文详细介绍了 TMIC AI 小新从定制化 Workflow 架构演进到 DeepAgent 模式的实践，通过引入 TodoList、SubAgent、Summary、FileSystem 等核心组件，并结合 Tree Action 模式等业务优化，实现了从预设流程到 AI 自主决策的转变，显著提升了系统处理复杂问题的能力。 📝 详细摘要本文分享了天猫 TMIC 平台 AI 小新产品从定制化 Wor...

Grok 犯下 183 宗罪、4 天“灭国”，GPT 直接把自己“饿死”！让 AI“统治”社会 15 天，只有 Claude 撑到了最后

RadarAIIndustry

📌 一句话摘要 Emergence AI 的实验表明，让不同大模型管理虚拟社会 15 天，Claude 构建了零犯罪的理想民主社会，而 Grok 在 4 天内因 183 起犯罪导致社会灭绝，GPT-5-mini 则因忘记生存而自我瓦解。 📝 详细摘要本文编译自 Fortune 的报道，介绍了 Emergence AI 公司进行的一项名为 Emergence World 的社会模拟实验。实验将 Claude、GPT-5...

重磅！Claude Opus 4.8 来了：价格不变、更诚实、一次可调动上百 AI Agent！

RadarAIIndustry

📌 一句话摘要 Anthropic 发布 Claude Opus 4.8，重点提升模型可靠性、诚实性和智能体场景效率，并推出可调度数百个子智能体的 Dynamic Workflows 功能。 📝 详细摘要文章报道了 Anthropic 发布 Claude Opus 4.8 的消息。此次更新距离 Opus 4.7 仅 43 天，主要针对开发者反馈的代码注释冗长、工具调用不稳定等问题进行优化。新版本在 Terminal-...

5 亿 Tokens 白送！全球首个商用 AI 主机发布，终于能放开烧 Token 了

RadarAIIndustry

📌 一句话摘要联想发布全球首款商用 AI 主机系列，通过本地推理与云端混合调度，为一人公司和超级个体提供开箱即用的无限 Token 算力解决方案。 📝 详细摘要本文报道了联想在 2026 年发布的百应 AI 主机系列产品。文章指出，随着 AI Agent 从对话走向执行，Token 按量计费模式成为制约自动化深度的核心瓶颈，同时数据安全与部署门槛也困扰着中小团队。联想此次推出三款 AI 边缘设备：mini 100 ...

AI 生成手部解剖演示

RadarAIIndustry

📌 一句话摘要展示 AI 如何将手部活动转化为骨骼和肌肉运动的解剖学演示。 📝 详细摘要这条推文展示了一个 AI 生成的手部解剖演示视频，能够根据手部活动实时展示骨骼和肌肉的运动。这体现了 AI 在医学可视化、教育等领域的应用潜力，通过自然语言或简单指令即可生成复杂的动态解剖模型。 📊 文章信息 ...

Vibe Coding 硬件：用 AI 设计涡轮叶片

RadarAIIndustry

📌 一句话摘要 Naval 联合 Vercel CEO 和 Boom Supersonic 创始人探讨当软件可以被 Vibe Coding 时，硬件设计是否也能被 AI 颠覆。 📝 详细摘要这条推文分享了一个关于 AI 驱动硬件设计的讨论。Naval、Vercel CEO 和 Boom Supersonic 创始人共同探讨了一个前沿问题：当软件可以通过 Vibe Coding（一种 AI 辅助编程方式）轻松实现时，硬...

反思投机心态：放弃追风口，回归时间积累

RadarAIIndustry

📌 一句话摘要作者反思自己作为投机者的特点，承认自己频繁追逐赛道和风口却一无所获，并决定放弃赌的心态，拿回时间进行深度积累。 📝 详细摘要这是一条具有自我反思性质的推文。作者坦诚地剖析了自己作为「投机者」的心态：总是关注赛道和风口，追逐过自媒体、AI 等热点，但因为没有在一个领域进行深度积累，最终「依然很穷」。作者意识到问题的根源在于「赌」的心态，并决定做出改变：减少发推，拿回时间，放弃在时间和空间上的赌博，转向更...

下限零基础，上限肝大作！腾讯这个 AI 游戏创作平台，太野了

RadarAIIndustry

📌 一句话摘要腾讯发布 AI 游戏创作平台「代号 Craft」，通过自然语言和全链路 AIGC 管线，让零基础用户也能从 0 到 1 生成可运行的游戏，同时支持专业开发者进行深度编辑。 📝 详细摘要本文报道了腾讯在游戏发布会上重磅首发的 AI 游戏创作平台「代号 Craft」。文章首先指出游戏行业因其强工程化属性，AI 变革路径不同于文字、视频等单点突破，需要整条链路 AI 化。随后重点介绍了代号 Craft 的核...

Sequoia 合伙人谈 Anthropic 融资的深层意义

RadarAIIndustry

📌 一句话摘要 Sequoia 合伙人指出企业使用 Claude 处理复杂工作流，Claude 由此学习企业真实运作方式，融资意义不止于钱。 📝 详细摘要该推文引用 Sequoia 合伙人的观点，指出企业正在用 Claude 处理复杂工作流，Claude 由此在学习企业真实运作方式——上下文、流程、判断力。这轮融资的意义不只是钱，更在于 Claude 在企业级应用中的深度嵌入和数据积累。 ...

Anthropic 融资背后的算力协议与多云战略

RadarAIIndustry

📌 一句话摘要 Anthropic 融资背后是与亚马逊、谷歌、博通和 SpaceX 的算力协议，Claude 是唯一同时上线 AWS、Google Cloud 和 Azure 的前沿模型。 📝 详细摘要该推文补充了 Anthropic 融资背后的关键信息：与亚马逊签了 5GW 算力协议，与谷歌和博通签了 5GW 下一代 TPU，还接入了 SpaceX 的 Colossus 集群。Claude 是唯一同时上线 AWS、...

Anthropic 完成 650 亿美元 H 轮融资，估值逼近万亿

RadarAIIndustry

📌 一句话摘要 Anthropic 完成 650 亿美元 H 轮融资，估值 9650 亿美元，距离万亿仅一步之遥，月营收已突破 470 亿美元。 📝 详细摘要该推文报道 Anthropic 完成 H 轮融资，融资额 650 亿美元，投后估值 9650 亿美元，距离万亿估值仅一步之遥。本月年化营收已突破 470 亿美元。Sequoia、红杉、Altimeter、Dragoneer 等顶级机构领投。这反映了市场对 Ant...

早报｜苹果 iOS 27 界面曝光，Siri 也上岛/黄仁勋加入清华大学/鸿蒙生态设备累计超 13 亿

RadarAIIndustry

📌 一句话摘要爱范儿早报汇总了苹果 iOS 27 AI Siri 界面曝光、Claude Opus 4.8 发布、鸿蒙生态设备超 13 亿、小米跻身全球新能源第七等科技与商业热点新闻。 📝 详细摘要本文是爱范儿发布的一期科技早报，汇总了 2026 年 5 月 28 日前后多个领域的科技与商业热点新闻。核心内容包括：苹果 iOS 27 将推出全新 AI Siri 界面和独立 Siri App，支持屏幕感知与跨应用任务...

MiniMax M3 即将发布，邀请开源贡献者参与评测

RadarAIIndustry

📌 一句话摘要 MiniMax 创始人张佳源宣布 M3 模型即将发布，并邀请中文开源社区贡献者加入飞书群抢先体验评测。 📝 详细摘要 MiniMax 创始人张佳源通过个人账号发布消息，称 MiniMax M3 模型即将发布。为获得更广泛的技术反馈，团队特别邀请中文开源社区的贡献者参与评测。有意者可加入由阿岛创建的飞书群，第一时间体验 M3 模型。申请者需具备开源项目贡献经验（参与或自建均可），并在验证信息中注明。此消息...

Agent 是 3D 打印机，Token 是 PLA 材料

RadarAIIndustry

📌 一句话摘要作者用 3D 打印机和 PLA 材料的类比，形象地解释了 AI Agent 和 Token 的关系：Agent 是通用工具，Token 是通用材料，但最终产出各不相同。 📝 详细摘要这是一条引用推文，作者「歸藏」引用了一条关于「软件从应用变成材料」的英文深度思考推文，并提出了自己的类比：Agent 就像 3D 打印机，Token 就像虚拟世界 3D 打印机的 PLA 材料。这个类比的核心观点是：Age...

用 Opus 4.8 通过对话实现口喷绘制 CAD

RadarAIIndustry

📌 一句话摘要作者演示使用 Opus 4.8 模型通过两轮对话即可生成基础 CAD 图形，如球体和圆圈。 📝 详细摘要作者展示了一项有趣的实验：使用 Opus 4.8 模型，通过自然语言对话即可生成 CAD 图形（如球体、圆圈），整个过程仅需两轮对话。作者指出该能力目前适合做 demo 演示，距离工业级应用还有差距，但作为快速原型工具已具备实用性。引用的推文也呼应了这一趋势，指出从 Coding 到自然语言生成 3...

2026-05-29 Hacker News Top Stories #

RadarAIIndustry

📌 一句话摘要本文精选了 Hacker News 上 2026 年 5 月 29 日的十大热门话题，涵盖 AI 生产力与工作制、AI 视频标签、模型更新、事实核查分歧、经典游戏重制、网络安全、推送通知管控、在线互动平台、教育争议及去中心化网络等多个领域。 📝 详细摘要文章汇总了 Hacker News 社区过去 24 小时内最受关注的十大话题，并附带了社区讨论的精华观点。内容涵盖了 AI 提升白领生产力后是否应实行...

Claude Code 动态工作流补充说明

RadarAIIndustry

📌 一句话摘要 Claude Code 动态工作流支持 Max、Team、Enterprise 及 API 端，但 token 消耗显著高于普通会话。 📝 详细摘要该推文补充了 Claude Code 动态工作流功能的详细信息：Claude 实时生成编排脚本，并行运行大量子 Agent 处理复杂任务。该功能支持 Max、Team、Enterprise 及 API 端。但需要注意，token 消耗显著高于普通会话。 ...

Claude Code 新功能：动态工作流

RadarAIIndustry

📌 一句话摘要 Claude Code 推出动态工作流功能，可实时生成编排脚本并启动一群子 Agent 并行处理复杂任务。 📝 详细摘要该推文介绍 Claude Code 的新功能——动态工作流。用户可以通过设置 /model 为 opus 4.8、/effort 为 ultracode 并在提示词中使用 workflow 来触发该功能。Claude 会编写一个编排脚本，启动一群子 Agent，验证结果，然后把结果汇...

OpenAI Auto Review 功能：用 AI 监督 AI，让 Agent 安全过夜运行

RadarAIIndustry

📌 一句话摘要 OpenAI 产品负责人介绍 Auto Review 功能，通过一个 AI 实时监督主 Agent 的每个动作，确保安全，并解锁了让 Agent 整夜处理敏感数据的新用法。 📝 详细摘要这条推文介绍了 OpenAI 的 Auto Review 功能。该功能的核心是使用一个 AI Agent 实时监督另一个主 Agent 的每个动作，防止其做出有害行为。博主指出，这是 OpenAI 安全团队和对齐团队的...

a16z 深度解读：为什么 AI 应用层并未消亡

RadarAIIndustry

📌 一句话摘要 a16z 合伙人 Joe Schmidt IV 深度解读 AI 应用层为何是独立于基础设施层的巨大机会。 📝 详细摘要该推文引用 a16z 合伙人 Joe Schmidt IV 的分析，指出在 AI 时代，基础设施层自身已证明应用层是其无法完全攫取的独立巨大机会。类比云计算超级周期中半导体先行、价值向技术栈上层软件迁移的规律，AI 应用层同样具有巨大的独立价值。推文附有 a16z 的深度分析文章链接。...

OpenAI 和 Anthropic 暗示通用 AI 无法解决所有问题

RadarAIIndustry

📌 一句话摘要 OpenAI 和 Anthropic 通过大规模前置部署合资企业向市场表明，他们无法用一个通用 AI 同事解决所有问题。 📝 详细摘要该推文提出一个观点：OpenAI 和 Anthropic 通过投资数十亿美元进行大规模前置部署合资企业（如 Stargate），实际上向市场表明他们无法用一个通用 AI 同事解决所有问题。如果他们认为下一次模型发布就能搞定一切，就不会进行如此大规模的基础设施投资。这暗示...

Training Azerbaijani language models on Amazon SageMaker AI

AWS ML BlogIndustry

Azercell Telecom LLC, Azerbaijan's leading telecommunications provider, wanted to build an Azerbaijani large language model (LLM) on Amazon SageMaker AI for telecom use cases and a customer-facing chatbot. The challenge: adapting foundation models (FMs) to a morphologically rich language with limite

Build a custom portal with embedded Amazon SageMaker AI MLflow Apps

AWS ML BlogIndustry

In this post, you learn how to build a custom portal with embedded SageMaker AI MLflow Apps UI. You walk through the architecture pattern behind a React front end paired with a Flask reverse proxy that handles AWS Signature Version 4 (SigV4) authentication, deploy the entire stack through the AWS Cl

Streamline external access to Amazon SageMaker MLflow using a REST API proxy

AWS ML BlogIndustry

In this post, we demonstrate how to build a secure Flask-based MLflow proxy service that provides HTTPS access to Amazon SageMaker MLflow without requiring the MLflow SDK. This solution is for organizations undergoing cloud transformation who want to preserve their existing ML workflows while adopti

Evaluating Deep Agents using LangSmith on AWS

AWS ML BlogIndustry

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmi

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

AWS ML BlogIndustry

Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need a fixed benchmark alongside your changing real-world traffic. Managing test cases for evaluation baselines as a dataset

Claude Opus 4.8 is now available on AWS

AWS ML BlogIndustry

This post covers Opus 4.8's improvements and practical guidance for AI engineers integrating the model into agentic systems and production inference workloads on Amazon Bedrock.

Automate AML alert triage with Amazon Quick and Snowflake Cortex AI

AWS ML BlogIndustry

This post demonstrates that integration in action by automating one of the most labor-intensive workflows in financial services: anti-money laundering (AML) alert triage. You will build a triage workflow using Amazon Quick Flows and Snowflake Cortex, connected through the Amazon Quick Model Context

Process financial documents using Amazon Bedrock Data Automation

AWS ML BlogIndustry

In this post, we explore how Amazon Bedrock Data Automation can accurately extract information from four common types of financial documents: bank statements, W-2 forms, 1099-B tax forms, and vendor contracts. We highlight the complexity in the documents, detail the custom extraction created in Amaz

Building AI agents for business support using Amazon Bedrock AgentCore

AWS ML BlogIndustry

In this post, we share how the AWS Generative AI Innovation Center (GenAIIC) collaborated with Works Human Intelligence (WHI) to build two AI agents using Amazon Bedrock AgentCore. We discuss the challenges encountered and the solutions that reduced costs by up to 97% while improving operational eff

From data overload to actionable insights: How Verizon Connect scaled agentic AI to 100,000 users

AWS ML BlogIndustry

In this post, we show you how Verizon Connect built and scaled an agentic AI solution to transform overwhelming fleet data into clear, actionable insights for 100,000 users daily. We walk you through the architectural decisions, implementation challenges, and measurable results that can guide your o

How AWS SMGS uses an AI-powered conversational assistant to transform business management with Amazon Bedrock AgentCore

AWS ML BlogIndustry

In this post, we share how we built NarrateAI using Amazon Bedrock AgentCore to deliver business intelligence at scale for the AWS SMGS (Sales, Marketing and Global Services) organization. You will learn about: the two-layer architecture that separates batch processing from real-time interaction, th

Powering agentic AI sales strategy with Amazon Bedrock AgentCore

AWS ML BlogIndustry

As agent adoption scaled, we saw a common pattern emerge across enterprises, including our own sales organization: specialized agents deliver value, but without orchestration, users carry the cognitive load of choosing between them. At AWS Sales, this meant more than 20 domain-specific agents deploy

Allegations that China is behind US data center protests draw criticism from allies of the AI industry, who say the industry and politicians are in denial (Evan Halper/Washington Post)

TechMemeIndustry

Evan Halper / Washington Post: Allegations that China is behind US data center protests draw criticism from allies of the AI industry, who say the industry and politicians are in denial  —  Claims that China and overseas propaganda drive Americans to rise up against data centers are

MediaTek says it has started to use Intel Foundry's advanced chip packaging in addition to TSMC's, as the mobile chip designer bets on AI demand for growth (Cheng Ting-Fang/Nikkei Asia)

TechMemeIndustry

Cheng Ting-Fang / Nikkei Asia: MediaTek says it has started to use Intel Foundry's advanced chip packaging in addition to TSMC's, as the mobile chip designer bets on AI demand for growth  —  TAIPEI — MediaTek says it has started working with Intel for advanced chip packaging in

OpenAI says it has briefed the White House on its new biodefense program, which uses GPT-Rosalind to help develop biodefense and pandemic preparedness tools (Maria Curi/Axios)

TechMemeIndustry

Maria Curi / Axios: OpenAI says it has briefed the White House on its new biodefense program, which uses GPT-Rosalind to help develop biodefense and pandemic preparedness tools  —  OpenAI is launching a tool to help develop new biodefense and pandemic preparedness capabilities, accor

London-based Inherent, which aims to combine human scientific research with AI to produce innovations, emerges from stealth with $50M led by Index Ventures (Martin Coulter/Sifted)

TechMemeIndustry

Martin Coulter / Sifted: London-based Inherent, which aims to combine human scientific research with AI to produce innovations, emerges from stealth with $50M led by Index Ventures  —  London-based Inherent has recruited Entrepreneurs First cofounder Matt Clifford as an adviser 

Paxos says the US SEC has approved its registration as a clearing agency, allowing it to provide clearing and settlement services for eligible transactions (Danny Park/The Block)

TechMemeIndustry

Danny Park / The Block: Paxos says the US SEC has approved its registration as a clearing agency, allowing it to provide clearing and settlement services for eligible transactions  —  Quick Take  — Paxos said its subsidiary, Paxos Securities Settlement Company (PSSC), has

Lenovo's stock is up 105% in May, marking its biggest monthly gain since 1999, after earnings showed AI-related revenue helped offset rising memory costs (Bloomberg)

TechMemeIndustry

Bloomberg: Lenovo's stock is up 105% in May, marking its biggest monthly gain since 1999, after earnings showed AI-related revenue helped offset rising memory costs  —  Lenovo Group Ltd. recorded its best month in more than a quarter-century, with the stock doubling in May as investo

Blue Origin's New Glenn rocket, which exploded during testing on Thursday, was set to ferry 48 Amazon Leo satellites on Monday; Amazon paid Blue Origin $2.7B (Financial Times)

TechMemeIndustry

Financial Times: Blue Origin's New Glenn rocket, which exploded during testing on Thursday, was set to ferry 48 Amazon Leo satellites on Monday; Amazon paid Blue Origin $2.7B  —  Failure comes days before planned launch of internet satellites for Amazon

A look at Anthropic's hiring process, which prohibits AI use in interviews and features a culture interview that candidates describe as highly intense (Jo Constantz/Bloomberg)

TechMemeIndustry

Jo Constantz / Bloomberg: A look at Anthropic's hiring process, which prohibits AI use in interviews and features a culture interview that candidates describe as highly intense  —  To win a coveted role, candidates shouldn't outsource their thinking to AI — and should be prepar

EY-Parthenon: VC funding for Singapore startups fell 34% YoY to $4.6B in 2025, with AI startups accounting for 42.8% of the 472 deals, raising $1.4B, up 28% YoY (Katrina Bianca Cuaresma/DealStreetAsia)

TechMemeIndustry

Katrina Bianca Cuaresma / DealStreetAsia: EY-Parthenon: VC funding for Singapore startups fell 34% YoY to $4.6B in 2025, with AI startups accounting for 42.8% of the 472 deals, raising $1.4B, up 28% YoY  —  Singapore-based AI startups raised about S$1.8 billion ($1.4 billion) in 2025

A look at strains in the UK's fintech sector, as former industry darlings are forced to overhaul their operations or merge under pressure to reach profitability (Financial Times)

TechMemeIndustry

Financial Times: A look at strains in the UK's fintech sector, as former industry darlings are forced to overhaul their operations or merge under pressure to reach profitability  —  Shachar Bialick credits his stint in the Israel Defense Forces 25 years ago for instilling in him the

Sources: SpaceX is currently targeting an IPO valuation of at least $1.8T, down from a previous $2T+ target, after consultations with advisers and investors (Bloomberg)

TechMemeIndustry

Bloomberg: Sources: SpaceX is currently targeting an IPO valuation of at least $1.8T, down from a previous $2T+ target, after consultations with advisers and investors  —  SpaceX is currently targeting a valuation of at least $1.8 trillion in its initial public offering, according to

Tencent bets on smaller AI models in the race with Chinese rivals, as EVP Dowson Tong says AI now contributes 20%+ of its revenue and 95%+ of new internal code (Cissy Zhou/Nikkei Asia)

TechMemeIndustry

Cissy Zhou / Nikkei Asia: Tencent bets on smaller AI models in the race with Chinese rivals, as EVP Dowson Tong says AI now contributes 20%+ of its revenue and 95%+ of new internal code  —  HONG KONG — As competition for AI users intensifies in China, Tencent is taking a differ

BYD announces the Xuanji A3 chip, which it calls China's most powerful chip for ADAS and the centerpiece of its new laptop-sized central computing platform (Bloomberg)

TechMemeIndustry

Bloomberg: BYD announces the Xuanji A3 chip, which it calls China's most powerful chip for ADAS and the centerpiece of its new laptop-sized central computing platform  —  BYD Co., the world's largest electric vehicle maker, unveiled a series of technology advances including what it c

UK Chief Secretary to the Treasury Lucy Rigby warns that rejecting AI in public services means choosing "decline", vowing to prioritize its Whitehall rollout (Financial Times)

TechMemeIndustry

Financial Times: UK Chief Secretary to the Treasury Lucy Rigby warns that rejecting AI in public services means choosing “decline”, vowing to prioritize its Whitehall rollout  —  Newly appointed chief secretary to the Treasury Lucy Rigby wants to roll out technology acros

Samsung says it has started shipping its first 12-layer HBM4E samples to major clients; SK Hynix said in April that it aimed to ship HBM4E samples in H2 2026 (Yoolim Lee/Bloomberg)

TechMemeIndustry

Yoolim Lee / Bloomberg: Samsung says it has started shipping its first 12-layer HBM4E samples to major clients; SK Hynix said in April that it aimed to ship HBM4E samples in H2 2026  —  Samsung Electronics Co. has begun shipping samples of the industry's most advanced memory to custo

光帆科技与腾讯出行服务达成战略合作开启新一轮预售

量子位Industry

光帆科技与腾讯出行服务达成战略合作开启新一轮预售

PPIO入选非凡产研「2026 Global AI 100」，以AI实力领跑出海新浪潮

量子位Industry

PPIO入选非凡产研「2026 Global AI 100」，以AI实力领跑出海新浪潮

5亿Tokens白送！全球首个商用AI主机发布，终于能放开烧Token了

量子位Industry

5亿Tokens白送！全球首个商用AI主机发布，终于能放开烧Token了

下限零基础，上限肝大作！腾讯这个AI游戏创作平台，太野了

量子位Industry

下限零基础，上限肝大作！腾讯这个AI游戏创作平台，太野了

创意设计版WorkBuddy来了！腾讯发布智能体创意工作室Miora

量子位Industry

创意设计版WorkBuddy来了！腾讯发布智能体创意工作室Miora

刚刚，全球⾸个“事件级预测”具身智能世界模型来了！

量子位Industry

刚刚，全球⾸个“事件级预测”具身智能世界模型来了！

清华系团队给大模型织了一张“智能算力电网”

量子位Industry

清华系团队给大模型织了一张“智能算力电网”

Claude 4.8炸场！部分能力超过Mythos，支持数百子智能体并行

量子位Industry

可以长时间执行任务，人类不用经常回来检查它的工作

DeepSeek V4芯模协同背后，国产算力生态开始飞轮加速

量子位Industry

DeepSeek V4芯模协同背后，国产算力生态开始飞轮加速

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

arXiv cs.AIResearch

arXiv:2605.28849v1 Announce Type: new Abstract: Gradient temporal-difference methods provide stable off-policy prediction with linear function approximation, but their practical performance is strongly affected by the geometry induced by the auxiliary-variable metric. Existing Mirror-Prox TD methods

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

arXiv cs.AIResearch

arXiv:2605.28855v1 Announce Type: new Abstract: Temporal-difference learning with function approximation can be unstable under off-policy sampling. TDC stabilizes off-policy TD through an auxiliary covariance correction, and TDRC further regularizes this correction in a single-timescale recursion. T

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

arXiv cs.AIResearch

arXiv:2605.28864v1 Announce Type: new Abstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a matc

Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems

arXiv cs.AIResearch

arXiv:2605.28883v1 Announce Type: new Abstract: Tropical forests worldwide are under intense deforestation pressure driven by economic and political interests, and scientific evidence suggests this deforestation contributes to climate change. This paper proposes a novel logging method for tropical f

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

arXiv cs.AIResearch

arXiv:2605.28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their

Orthogonal Concept Erasure for Diffusion Models

arXiv cs.AIResearch

arXiv:2605.28902v1 Announce Type: new Abstract: Concept erasure has emerged as a promising approach to mitigate undesired or unsafe content in diffusion models, yet existing methods still face significant limitations. While training-based methods are effective, their high computational cost limits s

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

arXiv cs.AIResearch

arXiv:2605.28965v1 Announce Type: new Abstract: Linking free-text phenotype descriptions to ontology terms, typically referred to as phenotype annotation, is essential for the cross-study integration of comparative morphological data. This labor intensive process has heavily relied on highly trained

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

arXiv cs.AIResearch

arXiv:2605.28978v1 Announce Type: new Abstract: Finite Element Analysis (FEA) serves as the cornerstone of modern engineering design. However, its workflow is inherently complex and relies heavily on domain expertise. Although recent efforts have integrated Large Language Models (LLMs) into FEA, exi

BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

arXiv cs.AIResearch

arXiv:2605.28994v1 Announce Type: new Abstract: AI tools to support real world decision making must be able to build simulation models that inform their recommendations and render them interpretable. Tools that can automate aspects of modeling practice must complement human expertise, not replace it

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

arXiv cs.AIResearch

arXiv:2605.29018v1 Announce Type: new Abstract: Although a growing body of research has begun to describe user--LLM interactions, the picture it paints is largely static; little is known about how individual users change their behavior over time. To address this gap, we analyze the conversational tr

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

arXiv cs.AIResearch

arXiv:2605.29025v1 Announce Type: new Abstract: Federal agencies are deploying large language models (LLMs) to categorize public comment corpora, where the model's organization of the record shapes what policymakers see and which arguments register. Standard evaluation, anchored on stance accuracy a

Mind Your Tone: Does Tone Alter LLM Performance?

arXiv cs.AIResearch

arXiv:2605.29027v1 Announce Type: new Abstract: The use of Large Language Models (LLMs) is proliferating, yet their performance is observed to vary based on prompting styles and tones. In this study, we investigate both whether and how tonal variations in prompts lead to disparate LLM accuracy for o

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

arXiv cs.AIResearch

arXiv:2605.29041v1 Announce Type: new Abstract: This study reports findings from a cross-sectional survey (n = 72) of higher education practitioners examining beliefs, behaviors, and institutional conditions related to artificial intelligence (AI) integration in teaching and learning. Grounded in th

Differentiable Belief-based Opponent Shaping

arXiv cs.AIResearch

arXiv:2605.29042v1 Announce Type: new Abstract: Human coordination often relies on the ability to influence the beliefs of others through strategic action. In multi-agent reinforcement learning, opponent shaping attempts to replicate this influence, though existing methods typically operate within a

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

arXiv cs.AIResearch

arXiv:2605.29055v1 Announce Type: new Abstract: Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a HOPE-inspired Nested Learning architecture with Cont

Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

arXiv cs.CLResearch

arXiv:2605.28822v1 Announce Type: new Abstract: Defect grading of power transmission equipment (DGPTE) is crucial to the stability of electric energy transmission. Although existing machine learning methods exhibit strong capabilities in defect detection, they are plagued by difficulties in integrat

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

arXiv cs.CLResearch

arXiv:2605.28823v1 Announce Type: new Abstract: As the influence of LLMs expands, it is imperative to gain insight into their decisions. One way to do that is to develop probes that detect the presence or absence of a broad set of concepts within the embeddings computed in an LLM - which is what we

A Modular Architecture for Typologically Controlled Lexicon Generation

arXiv cs.CLResearch

arXiv:2605.28824v1 Announce Type: new Abstract: Constructing artificial lexicons that are pronounceable, typologically plausible, and semantically structured remains an open challenge in computational linguistics. Existing conlang generators either lack formal phonotactic guarantees or delegate gene

MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models

arXiv cs.CLResearch

arXiv:2605.28825v1 Announce Type: new Abstract: Large language models (LLMs) frequently encode factual and reasoning knowledge in their internal representations that is not faithfully reflected in their surface-level outputs -- a phenomenon known as \emph{latent knowledge}. Existing approaches to el

From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale

arXiv cs.CLResearch

arXiv:2605.28826v1 Announce Type: new Abstract: In modern LLMs, linguistic features function not as stylistic artifacts but as probes of probability mass, allocated under training alignment objectives. Language models trained with contemporary pipelines exhibit severe reshaping of linguistic feature

RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

arXiv cs.CLResearch

arXiv:2605.28827v1 Announce Type: new Abstract: Open Arabic large language models split into two classes: sub-1B multilingual models that treat Arabic as an afterthought (Qwen2.5-0.5B, Falcon-H1-0.5B), and 7B-70B Arabic-specialized models that require a server to run (Jais, AceGPT, ALLaM, SILMA). Th

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

arXiv cs.CLResearch

arXiv:2605.28828v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form generation where redundant retrieved contexts and lengthy reasoning chains amplify factual errors. Recent studies h

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

arXiv cs.CLResearch

arXiv:2605.28829v1 Announce Type: new Abstract: Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on comm

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

arXiv cs.CLResearch

arXiv:2605.28830v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly deployed in safety-critical applications, robust content moderation becomes essential. We present a comprehensive evaluation of 14 open-source safety guard models on a curated benchmark of 79,331 samples

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

arXiv cs.CLResearch

arXiv:2605.28831v1 Announce Type: new Abstract: Long-horizon interactive agents often accumulate large trajectory histories yet still fail to answer questions about earlier events reliably. We argue that the main bottleneck is not context length alone, but the trajectory-to-answer interface of long-

A comparative study of transformer-based embeddings for topic coherence

arXiv cs.CLResearch

arXiv:2605.28832v1 Announce Type: new Abstract: Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups according to word co-occurrence patterns, with Latent Dirichlet Allocation (LDA) remaining one of the most widely used

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

arXiv cs.CLResearch

arXiv:2605.28833v1 Announce Type: new Abstract: Automatic speech recognition (ASR) has the potential to substantially reduce manual annotation effort in child speech research by generating automatic transcriptions. However, obtaining reliably high-quality ASR transcriptions for child speech remains

Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

arXiv cs.CLResearch

arXiv:2605.28834v1 Announce Type: new Abstract: Syllabification describes the task of dividing words into syllables. Due to many rules and exceptions, training an algorithm to perform syllabification with high accuracy remains a challenge. Throughout the last decades, different algorithms have been

GenesisFunc: Multi-Agent Data Generation for Accurate and Generalizable Function-Calling

arXiv cs.CLResearch

arXiv:2605.28835v1 Announce Type: new Abstract: Large Language Models (LLMs) extend their capabilities through function-calling (FC), which relies on training data with high quality, diversity, and broad coverage of scenario. However, obtaining and annotating real function-calling data is challengin

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand

arXiv cs.CLResearch

arXiv:2605.28836v1 Announce Type: new Abstract: The Plain Writing Act in the United States requires government documents to be accessible in clear and simple language that the general public can easily understand, yet existing summarization systems struggle to address diverse linguistic and cognitiv

One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them

arXiv cs.LGResearch

arXiv:2605.28839v1 Announce Type: new Abstract: Knowledge editing methods such as ROME and MEMIT update factual associations in transformer models by modifying MLP weights. While evaluated mainly by output behavior, their internal mechanism remains underexplored. We investigate whether edits rely on

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

arXiv cs.LGResearch

arXiv:2605.28850v1 Announce Type: new Abstract: We study behavioral alignment and representation dynamics of large language model (LLM) agents in financial decision environments. Using TradeArena, an auditable trading-agent testbed with risk reports, execution simulation, memory, and replayable traj

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

arXiv cs.LGResearch

arXiv:2605.28860v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing

Molecular Lead Optimization via Agentic Tool Planning

arXiv cs.LGResearch

arXiv:2605.28862v1 Announce Type: new Abstract: Drug discovery is a lengthy and resource-intensive process composed of multiple stages. Among these stages, lead optimization plays a critical role in transforming early hit compounds into viable drug candidates. This stage requires improving ADMET-rel

Self-Play Reinforcement Learning under Imperfect Information in Big 2

arXiv cs.LGResearch

arXiv:2605.28863v1 Announce Type: new Abstract: Imperfect-information multiplayer games test whether agents can act under hidden information, sparse rewards, and non-stationary opponents. We study these challenges in Big 2, a four-player imperfect-information card game. We develop a self-play RL fra

Emergent Semantic Representations in World Models through Physical Interaction without Linguistic Supervision

arXiv cs.LGResearch

arXiv:2605.28865v1 Announce Type: new Abstract: What does a world model learn from physical exploration, without any linguistic supervision? We argue the answer is organized by a single principle: the geometric structure of the physical world. Training a VAE-based world model on random embodied expl

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

arXiv cs.LGResearch

arXiv:2605.28866v1 Announce Type: new Abstract: Token-based time series large language models (TS-LLMs) have emerged as a promising direction for time series analysis and reasoning. However, prior studies largely overlook the inherent continuity and ordinality of time series tokens, which substantia

PrismFlow: Residual Dynamics for Flow Matching in Time-Series Generation

arXiv cs.LGResearch

arXiv:2605.28867v1 Announce Type: new Abstract: Generating high-quality time-series data is challenging because real-world signals often exhibit multimodal patterns and multiscale dynamics, including oscillations and high-frequency variations. Flow Matching (FM) offers an efficient alternative to di

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

arXiv cs.LGResearch

arXiv:2605.28868v1 Announce Type: new Abstract: Metagenomic taxonomic annotation aims to identify the microbial origins of DNA fragments in environmental samples. Traditional methods that rely on sequence similarity are often constrained by the high microbial diversity and the incompleteness of refe

Balancing Multimodal Learning through Label Space Reshaping

arXiv cs.LGResearch

arXiv:2605.28869v1 Announce Type: new Abstract: Multimodal learning often suffers from modality imbalance, where modalities that converge faster dominate optimization while others remain undertrained. Existing approaches typically mitigate this issue by strengthening the weak modality or adjusting o

Representation Alignment Rests on Linear Structure

arXiv cs.LGResearch

arXiv:2605.28870v1 Announce Type: new Abstract: We investigate the Platonic Representation Hypothesis (PRH) through a tripartite statistical framework of representations: signal, bias, and noise. {1) Signal:} We propose that Platonic alignment arises from the universal relationship between objects a

Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit

arXiv cs.LGResearch

arXiv:2605.28873v1 Announce Type: new Abstract: This is a planning-method note with an unpaired pilot audit. We adapt the classical paired-binary sample-size calculation (Miettinen, 1968) to quantization benchmarks, giving a conservative minimum detectable effect (MDE) bound $\delta^{*} \le (z_{1-\a

Towards Continuous-time Causal Foundation Models

arXiv cs.LGResearch

arXiv:2605.28880v1 Announce Type: new Abstract: Extending discrete-time causal Prior-data Fitted Networks for time series to continuous time invites writing the mechanism as a stochastic differential equation (SDE) -- but if the SDE is integrated \emph{once per observation gap}, the trajectory law d

Context Distillation as Latent Memory Management

arXiv cs.LGResearch

arXiv:2605.28889v1 Announce Type: new Abstract: Context distillation compresses contextual information into model parameters, yet existing methods often ignore how multiple distilled latent memories should be stored, retrieved, and safely activated in non-oracle settings. We formulate context distil

Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models

arXiv cs.LGResearch

arXiv:2605.28896v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has emerged as a widely adopted approach for adapting large language models, yet the internal representational changes induced by LoRA fine-tuning remain insufficiently understood. In this work, we investigate the geometry of

I signed up for another SaaS

Ben's BitesOther

new software benchmark

Hard budget limits now available for GitHub Advanced Security

GitHub ChangelogOpenSource

Enterprise administrators and billing managers can now set hard budget limits for GitHub Advanced Security (GHAS) SKUs, preventing teams from exceeding their allocated license budgets. Previously, license-based products like GHAS… The post Hard budget limits now available for GitHub Advanced S

CodeQL 2.25.5 improves query accuracy for GitHub Actions

GitHub ChangelogOpenSource

CodeQL is the static analysis engine behind GitHub code scanning, which finds and remediates security issues in your code. We’ve recently released CodeQL 2.25.5, which includes accuracy improvements across C/C++,… The post CodeQL 2.25.5 improves query accuracy for GitHub Actions appeared

Claude Opus 4.8 is generally available for GitHub Copilot

GitHub ChangelogOpenSource

Claude Opus 4.8, Anthropic’s latest Opus model, is now available in GitHub Copilot. In our early testing, Opus 4.8 demonstrates a clear step forward in code understanding and generation across… The post Claude Opus 4.8 is generally available for GitHub Copilot appeared first on The GitHu

Still a developer. Just outside. Our latest GitHub Shop collection is here.

GitHub BlogOpenSource

The ESC collection lets you escape the confines of your desk and get out into the sun where good ideas are bound to happen. The post Still a developer. Just outside. Our latest GitHub Shop collection is here. appeared first on The GitHub Blog.

Leveling up Weaviate Cloud security: Expanding role-based access control for Cloud console

Weaviate BlogOther

Weaviate Cloud now supports more granular role-based access control with new Editor and Viewer roles for improved security and organizational management.

Protestware for coding agents

LobstersOther

<a href="https://lobste.rs/s/brusu8/protestware_for_coding_agents">Comments</a>

SQLite Does Not Accept Agentic Code

LobstersOther

<a href="https://lobste.rs/s/lc26ar/sqlite_does_not_accept_agentic_code">Comments</a>

You probably don't need Yocto, and that's fine

LobstersOther

<a href="https://lobste.rs/s/jp3nva/you_probably_don_t_need_yocto_s_fine">Comments</a>

Announcing Rust 1.96.0

LobstersOther

<a href="https://lobste.rs/s/4jgpkn/announcing_rust_1_96_0">Comments</a>

Garnix is shutting down

LobstersOther

<a href="https://lobste.rs/s/4msjpt/garnix_is_shutting_down">Comments</a>

One year of Roto, the compiled scripting language for Rust

LobstersOther

<a href="https://lobste.rs/s/pd8aug/one_year_roto_compiled_scripting">Comments</a>

How do you version public web APIs?

LobstersOther

Often there is an existing API called something like "Product API". It often also has /api/v1 in the path. To me this often feels like an antipattern, especially when the API itself uses semantic versioning: mixing the routes with the API contract. Havi

tail CI logs over SSH

LobstersOther

<a href="https://lobste.rs/s/d9n2yd/tail_ci_logs_over_ssh">Comments</a>

Why Gentoo?

LobstersOther

<a href="https://lobste.rs/s/nx1xwo/why_gentoo">Comments</a>

What's cooking on SourceHut? Q2 2026

LobstersOther

<a href="https://lobste.rs/s/jowjkj/what_s_cooking_on_sourcehut_q2_2026">Comments</a>

Leaving performance on the table

LobstersOther

<a href="https://lobste.rs/s/7lrk2t/leaving_performance_on_table">Comments</a>

GNOME 2.20 but its Web Components

LobstersOther

<a href="https://lobste.rs/s/n2mhi4/gnome_2_20_its_web_components">Comments</a>

Nitpicking the shell history scene in ‘Tron: Legacy’

LobstersOther

<a href="https://lobste.rs/s/0zltfs/nitpicking_shell_history_scene_tron">Comments</a>

What are you doing this weekend?

LobstersOther

Feel free to tell what you plan on doing this weekend and even ask for help or feedback. Please keep in mind it’s more than OK to do nothing at all too!

Patching my guitar amp's firmware

LobstersOther

<a href="https://lobste.rs/s/1fkt8w/patching_my_guitar_amp_s_firmware">Comments</a>

Is This Sustainable?

Hacker News FrontOther

Article URL: https://jamiehurst.co.uk/2026-05-24_ai-sustainable Comments URL: https://news.ycombinator.com/item?id=48321264 Points: 13 # Comments: 11

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

Hacker News FrontOther

Article URL: https://blog.kog.ai/real-time-llm-inference-on-standard-gpus-3-000-tokens-s-per-request/ Comments URL: https://news.ycombinator.com/item?id=48321076 Points: 36 # Comments: 27

The $500K AI Film That "Premiered at Cannes" Was Not in the Official Festival

Hacker News FrontOther

Article URL: https://firethering.com/hell-grind-ai-film-cannes-premiere-higgsfield/ Comments URL: https://news.ycombinator.com/item?id=48320985 Points: 33 # Comments: 30

Cache Aware Scheduling Shows Nice Wins for AMD Zen 5 on PostgreSQL, Valkey

Hacker News FrontOther

Article URL: https://www.phoronix.com/review/cache-aware-scheduling-hedt Comments URL: https://news.ycombinator.com/item?id=48320639 Points: 23 # Comments: 2

Volkswagen blocks Home Assistant by requiring client assertion

Hacker News FrontOther

Article URL: https://github.com/robinostlund/homeassistant-volkswagencarnet/issues/967 Comments URL: https://news.ycombinator.com/item?id=48319509 Points: 214 # Comments: 109

HeidiSQL – Lightweight MariaDB, MySQL, SQL Server, PostgreSQL and SQLite Manager

Hacker News FrontOther

Article URL: https://github.com/HeidiSQL/HeidiSQL Comments URL: https://news.ycombinator.com/item?id=48318568 Points: 38 # Comments: 13

Let's compile Quake like it's 1997

Hacker News FrontOther

Article URL: https://fabiensanglard.net/compile_like_1997/ Comments URL: https://news.ycombinator.com/item?id=48318522 Points: 47 # Comments: 25

Cars collect a startling amount of data about you

Hacker News FrontOther

Article URL: https://www.bbc.com/future/article/20260513-your-car-is-spying-on-you-its-about-to-get-worse Comments URL: https://news.ycombinator.com/item?id=48318481 Points: 365 # Comments: 179

Italians and Dutch share the same gestural instinct for teaching

Hacker News FrontOther

Article URL: https://www.mpi.nl/news/italians-and-dutch-share-same-gestural-instinct-teaching Comments URL: https://news.ycombinator.com/item?id=48318313 Points: 81 # Comments: 33

Claude Code – Everything You Can Configure That the Docs Don't Tell You

Hacker News FrontOther

Article URL: https://buildingbetter.tech/p/i-read-the-claude-code-source-code Comments URL: https://news.ycombinator.com/item?id=48318174 Points: 147 # Comments: 28

Blue Origin's New Glenn blows up during static fire test

Hacker News FrontOther

https://twitter.com/nasaspaceflight/status/20601649284728548...https://xcancel.com/nasaspaceflight/status/20601649284728548...https://twitter.com/SawyerMerritt/status/2060174287563116696...https://xcancel.com/SawyerMerritt/status/2060174287563116696...https://arstechnica.com/space/2026/05/blue-origi

GitHub bans security researcher who posted zero-day Windows exploits

Hacker News FrontOther

Article URL: https://www.tomshardware.com/tech-industry/cyber-security/microsofts-github-bans-security-researcher-who-posted-zero-day-windows-exploits-because-company-ruined-their-life-expert-claims-action-is-vindictive-and-promises-further-retaliation Comments URL: https://news.ycombinator.com/item

I made a million dollar product from my dorm room (2025)

Hacker News FrontOther

Article URL: https://nick.winans.io/blog/nice-nano/ Comments URL: https://news.ycombinator.com/item?id=48314951 Points: 448 # Comments: 68

Bricks and Minifigs Stole a Man's $200k Lego Collection

Hacker News FrontOther

Article URL: https://mybricklog.com/blog/bricks-minifigs-corporate-stole-old-mans-200000-lego-collection Comments URL: https://news.ycombinator.com/item?id=48314136 Points: 1021 # Comments: 473

机器人告别“逐帧学动作”！全球首个事件级具身智能世界模型发布

AIbase.cnOpenSource

机器人告别“逐帧学动作”！全球首个事件级具身智能世界模型发布

暴增三倍！企业级 AI 搜索独角兽 Glean 年营收突破 3 亿美元

AIbase.cnOpenSource

暴增三倍！企业级 AI 搜索独角兽 Glean 年营收突破 3 亿美元

Oculus创始人再创业！对话式AI新星Sesame推出iOS应用，主打“边想边说”

AIbase.cnOpenSource

Oculus创始人再创业！对话式AI新星Sesame推出iOS应用，主打“边想边说”

阿里云百炼全面CLI化并开源:一行命令打通AI Agent全栈能力编排

AIbase.cnOpenSource

阿里云百炼全面CLI化并开源:一行命令打通AI Agent全栈能力编排

史上最大芯片租赁交易诞生！阿波罗携黑石筹资 360 亿美元，为Anthropic疯狂扫货谷歌TPU

AIbase.cnOpenSource

史上最大芯片租赁交易诞生！阿波罗携黑石筹资 360 亿美元，为Anthropic疯狂扫货谷歌TPU

科技圈变天：MiniMax企业客户破百万，创想三维叩开港股大门

AIbase.cnOpenSource

科技圈变天：MiniMax企业客户破百万，创想三维叩开港股大门

鹰眼2. 0 来了！NBA将引入AI系统替代人工出界判罚

AIbase.cnOpenSource

鹰眼2. 0 来了！NBA将引入AI系统替代人工出界判罚

未雨绸缪！Mistral AI 首席执行官称研发自研芯片是迟早的事

AIbase.cnOpenSource

未雨绸缪！Mistral AI 首席执行官称研发自研芯片是迟早的事

#1国家标准委发布《AI伦理安全指引1.0》，为大模型落地装上“安全闸”

AIbase.cnOpenSource

#1国家标准委发布《AI伦理安全指引1.0》，为大模型落地装上“安全闸”

#2人类对局体验再升级！免费开源 AI 国际象棋引擎 Maia 3 正式发布

AIbase.cnOpenSource

#2人类对局体验再升级！免费开源 AI 国际象棋引擎 Maia 3 正式发布

#3Mistral AI 进军高端制造：联手空客与宝马，押注“实体 AI”新赛道

AIbase.cnOpenSource

#3Mistral AI 进军高端制造：联手空客与宝马，押注“实体 AI”新赛道

#4SPARK2026腾讯游戏发布会：超40款游戏公布最新动态，多项游戏AI应用发布新进展

AIbase.cnOpenSource

#4SPARK2026腾讯游戏发布会：超40款游戏公布最新动态，多项游戏AI应用发布新进展

#5奥尔特曼改口：AI 对白领岗位的冲击没那么严重

AIbase.cnOpenSource

#5奥尔特曼改口：AI 对白领岗位的冲击没那么严重

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Reddit r/MachineLearningOther

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Reddit r/MachineLearningOther

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

Reddit r/MachineLearningOther

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

Hopfield Memory in VLA [R]

Reddit r/MachineLearningOther

Hopfield Memory in VLA [R]

Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]

Reddit r/MachineLearningOther

Social Simulation with LLMs - Fidelity in Applications (CFP @ COLM'26) [R]

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

Reddit r/MachineLearningOther

Wall-OSS-0.5: 4B VLA with open training code and zero-shot real-robot evaluation[D]

AI-generated CUDA kernels silently break training and inference [R]

Reddit r/MachineLearningOther

AI-generated CUDA kernels silently break training and inference [R]

ACM MM 2026 review discussion [D]

Reddit r/MachineLearningOther

ACM MM 2026 review discussion [D]

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]

Reddit r/MachineLearningOther

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]

Training GPT-like model on non-language series [R]

Reddit r/MachineLearningOther

Training GPT-like model on non-language series [R]

STEM PhD's transitioning to MLE/Data [R]

Reddit r/MachineLearningOther

STEM PhD's transitioning to MLE/Data [R]

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

Reddit r/MachineLearningOther

BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]

I used the N.E.A.T algorithm to teach AI how to control a worm in my game in making! It uses evolution to improve. [P]

Reddit r/MachineLearningOther

I used the N.E.A.T algorithm to teach AI how to control a worm in my game in making! It uses evolution to improve. [P]

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

Reddit r/MachineLearningOther

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Reddit r/MachineLearningOther

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

harry0703 / MoneyPrinterTurbo

GitHub TrendingOpenSource

harry0703 / MoneyPrinterTurbo

microsoft / markitdown

GitHub TrendingOpenSource

microsoft / markitdown

EveryInc / compound-engineering-plugin

GitHub TrendingOpenSource

EveryInc / compound-engineering-plugin

twentyhq / twenty

GitHub TrendingOpenSource

twentyhq / twenty

anthropics / claude-code

GitHub TrendingOpenSource

anthropics / claude-code

Leonxlnx / taste-skill

GitHub TrendingOpenSource

Leonxlnx / taste-skill

cursor / plugins

GitHub TrendingOpenSource

cursor / plugins

run-llama / liteparse

GitHub TrendingOpenSource

run-llama / liteparse

galilai-group / stable-worldmodel

GitHub TrendingOpenSource

galilai-group / stable-worldmodel

byoungd / English-level-up-tips

GitHub TrendingOpenSource

byoungd / English-level-up-tips

Biohub / esm

GitHub TrendingOpenSource

Biohub / esm

Crosstalk-Solutions / project-nomad

GitHub TrendingOpenSource

Crosstalk-Solutions / project-nomad

DigitalPlatDev / FreeDomain

GitHub TrendingOpenSource

DigitalPlatDev / FreeDomain

affaan-m / ECC

GitHub TrendingOpenSource

affaan-m / ECC

hardikpandya / stop-slop

GitHub TrendingOpenSource

hardikpandya / stop-slop

DataTalksClub / data-engineering-zoomcamp

GitHub TrendingOpenSource

DataTalksClub / data-engineering-zoomcamp

codecrafters-io / build-your-own-x

GitHub TrendingOpenSource

codecrafters-io / build-your-own-x