AI & ML

TheHackerNews · 7d ago

Attackers Use LLM Agent for Post-Exploitation After Marimo CVE-2026-39987 Exploit

Attackers Use LLM Agent for Post-Exploitation After Marimo CVE-2026-39987 Exploit An unknown threat actor has been observed using a large language model (LLM) agent to conduct post-compromise actions after obtaining ini…

impact 59

SecurityWeek · 21h ago

In Other News: Anthropic Maps AI Threats, Unpatched Comodo Flaw, Palantir Chief Eyed for CISA

In Other News: Anthropic Maps AI Threats, Unpatched Comodo Flaw, Palantir Chief Eyed for CISA Other noteworthy stories that might have slipped under the radar: Ultrahuman data leak, The Gentlemen ransomware analysis, Ho…

impact 50

DeepMind · 32w ago

Introducing CodeMender: an AI agent for code security

Introducing CodeMender: an AI agent for code security Using advanced AI to fix critical software vulnerabilities

impact 41

arXiv AI · 7h ago

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models arXiv:2606.05429v1 Announce Type: new Abstract: Post-training quantization (PTQ) is critical for the efficient depl…

impact 34

arXiv AI · 7h ago

Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison arXiv:2606.05436v1 Announce Type: new Abstract: Summarizing the latest medical literatu…

impact 34

arXiv AI · 7h ago

Evaluating Agentic Configuration Repair for Computer Networks

Evaluating Agentic Configuration Repair for Computer Networks arXiv:2606.06212v1 Announce Type: new Abstract: Misconfigurations in computer networks remain a major source of critical Internet outages. Research is turnin…

impact 34

DeepMind · 25w ago

Deepening our partnership with the UK AI Security Institute

Deepening our partnership with the UK AI Security Institute Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research

impact 33

arXiv AI · 7h ago

GITCO: Gated Inference-Time Context Optimization in TSFMs

GITCO: Gated Inference-Time Context Optimization in TSFMs arXiv:2606.05332v1 Announce Type: new Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches c…

impact 26

KrebsOnSecurity · 3w ago

Patch Tuesday, May 2026 Edition

Patch Tuesday, May 2026 Edition Artificial intelligence platforms may be just as susceptible to social engineering as human beings, but they are proving remarkably good at finding security vulnerabilities in human-made…

impact 26

Microsoft · 11w ago

Announcing Copilot leadership update

Announcing Copilot leadership update Satya Nadella, Chairman and CEO, and Mustafa Suleyman, Executive Vice President and CEO of Microsoft AI, shared the below communications with Microsoft employees this morning. SATYA…

impact 26

NIST · 1d ago

New AI Model Shows How to Evacuate for Fires One Safe Step at a Time

New AI Model Shows How to Evacuate for Fires One Safe Step at a Time A NIST-led team has created a new AI model that can identify safe evacuation routes in a single-story floor plan during a fire, with a multilevel vers…

impact 24

NIST · 20w ago

CAISI Issues Request for Information About Securing AI Agent Systems

CAISI Issues Request for Information About Securing AI Agent Systems The Center for AI Standards and Innovation (CAISI) at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) has publ…

impact 24

NIST · 35w ago

CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks

CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks The Center for AI Standards and Innovation at NIST evaluated several leading models from DeepSeek, an AI company based in the People’s Republic of Chin…

impact 24

Product Hunt · 5d ago

Astra Autonomous Pentest

Astra Autonomous Pentest AI agents that find, validate, and fix every vulnerability Discussion | Link

impact 23

TheHackerNews · 3h ago

AI Agent Uncovers 21 Zero-Days in FFmpeg; Chrome Patches Record 429 Bugs

AI Agent Uncovers 21 Zero-Days in FFmpeg; Chrome Patches Record 429 Bugs Two things landed within days of each other this week. A security startup reported 21 previously unknown vulnerabilities in FFmpeg, the media libr…

impact 20

TheHackerNews · 7d ago

ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface

ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface Cybersecurity researchers have disclosed details of a vulnerability in OpenAI ChatGPT that leverages the artificial intelligence (AI) assistan…

impact 20

TheHackerNews · 1d ago

ThreatsDay Bulletin: AI Agents Gone Wrong, Sketchy C2 Tools, ClickFix Tricks, JS Backdoors & 20+ New Stories

ThreatsDay Bulletin: AI Agents Gone Wrong, Sketchy C2 Tools, ClickFix Tricks, JS Backdoors & 20+ New Stories It got stupid again. The internet still feels held together with tape.

impact 19

arXiv AI · 7h ago

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment arXiv:2606.05256v1 Announce Type: new Abstract: This study analyzes a publicly released dataset from a discontinued fie…

impact 16

arXiv AI · 7h ago

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems arXiv:2606.05304v1 Announce Type: new Abstract: Multi-agent systems (MAS) built on large language models are typically organized aroun…

impact 16

arXiv AI · 7h ago

SentinelBench: A Benchmark for Long-Running Monitoring Agents

SentinelBench: A Benchmark for Long-Running Monitoring Agents arXiv:2606.05342v1 Announce Type: new Abstract: AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default mode…

impact 16

arXiv AI · 7h ago

Synthetic Contrastive Reasoning for Multi-Table Q&A

Synthetic Contrastive Reasoning for Multi-Table Q&A arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional r…

impact 16

arXiv AI · 7h ago

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges arXiv:2606.05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where m…

impact 16

arXiv AI · 7h ago

Residual Modeling for High-Fidelity Learned Compression of Scientific Data

Residual Modeling for High-Fidelity Learned Compression of Scientific Data arXiv:2606.05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Lear…

impact 16

arXiv AI · 7h ago

Agents' Last Exam

Agents' Last Exam arXiv:2606.05405v1 Announce Type: new Abstract: Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment…

impact 16

arXiv AI · 7h ago

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers arXiv:2606.05420v1 Announce Type: new Abstract: The rapid proliferation of hyperscale data centers (HDCs) in the US, mainly driven by…

impact 16

arXiv AI · 7h ago

Insurance of Agentic AI

Insurance of Agentic AI arXiv:2606.05449v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) systems are transforming the risk landscape by extending beyond information generation to autonomous planning,…

impact 16

arXiv AI · 7h ago

PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage

PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage arXiv:2606.05463v1 Announce Type: new Abstract: Patient safety event triage, determining whether a clinical event is r…

impact 16

arXiv AI · 7h ago

Severity-Aware Curriculum Learning with Multi-Model Response Selection for Medical Text Generation

Severity-Aware Curriculum Learning with Multi-Model Response Selection for Medical Text Generation arXiv:2606.05510v1 Announce Type: new Abstract: Telehealth systems have become increasingly important for delivering acc…

impact 16

arXiv AI · 7h ago

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts arXiv:2606.05513v1 Announce Type: new Abstract: Epidemic LLM forecasters are usually trained and evaluated as static supervised mode…

impact 16

arXiv AI · 7h ago

SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization arXiv:2606.05525v1 Announce Type: new Abstract: Recent advances in agentic visualization have enabled the translati…

impact 16

arXiv AI · 7h ago

When Should We Protect AI? A Precautionary Framework for Consciousness Uncertainty

When Should We Protect AI? A Precautionary Framework for Consciousness Uncertainty arXiv:2606.05528v1 Announce Type: new Abstract: Existing frameworks assess whether AI systems might be conscious but provide no guidance…

impact 16

arXiv AI · 7h ago

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations arXiv:2606.05563v1 Announce Type: new Abstract: Evaluating LLM mediators remains challenging, as m…

impact 16

arXiv AI · 7h ago

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution

Multilingual Fine-Tuning via Localized Gradient Conflict Resolution arXiv:2606.05613v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) has established cross-lingual versatility as a defi…

impact 16

arXiv AI · 7h ago

Evaluation of LLMs for Mathematical Formalization in Lean

Evaluation of LLMs for Mathematical Formalization in Lean arXiv:2606.05632v1 Announce Type: new Abstract: Within the past few years, the ability of Large Language Models (LLMs) to generate formal mathematical proofs has…

impact 16

arXiv AI · 7h ago

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage? arXiv:2606.05647v1 Announce Type: new Abstract: AI coding agents are increasingly embedded in real-world software development, collaborating with human…

impact 16

arXiv AI · 7h ago

Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows

Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows arXiv:2606.05670v1 Announce Type: new Abstract: Does adding more agents help an LLM workflow once compared systems share the same be…

impact 16

arXiv AI · 7h ago

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio arXiv:2606.05682v1 Announce Type: new Abstract: Demand for low-precision inference, including NVFP4-based approaches, has grown as large lang…

impact 16

arXiv AI · 7h ago

DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance

DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance arXiv:2606.05728v1 Announce Type: new Abstract: Generating executable tool plans requires selecting appropriate subsets from tool libr…

impact 16

arXiv AI · 7h ago

When AI Says It Feels

When AI Says It Feels arXiv:2606.05734v1 Announce Type: new Abstract: Large language models (LLMs) are generally constrained from expressing feelings through human-preference alignment in post-training processes. This p…

impact 16

arXiv AI · 7h ago

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents arXiv:2606.05761v1 Announce Type: new Abstract: Persistent AI assistants, such as OpenClaw, accumulate large collecti…

impact 16

arXiv AI · 7h ago

From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents

From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents arXiv:2606.05805v1 Announce Type: new Abstract: LLM-based guardrails typically safeguard agents by evaluating pro…

impact 16

arXiv AI · 7h ago

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents arXiv:2606.05806v1 Announce Type: new Abstract: Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''hap…

impact 16

arXiv AI · 7h ago

Statistical Priors for Implicit Preferences: Decoupling Skill Selection as a Local Harness in Personal Agents

Statistical Priors for Implicit Preferences: Decoupling Skill Selection as a Local Harness in Personal Agents arXiv:2606.05828v1 Announce Type: new Abstract: As Large Language Model (LLM) capabilities advance, locally d…

impact 16

arXiv AI · 7h ago

Agentic Molecular Recovery via Molecule-Aware Exploration

Agentic Molecular Recovery via Molecule-Aware Exploration arXiv:2606.05847v1 Announce Type: new Abstract: Text-guided molecular generation with LLMs often yields invalid SMILES. We argue that invalid drafts should be ad…

impact 16

arXiv AI · 7h ago

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving arXiv:2606.05875v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) improves large language model (LLM) answer quality by g…

impact 16

arXiv AI · 7h ago

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR arXiv:2606.05932v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards (RLVR) improves reasoning even w…

impact 16

arXiv AI · 7h ago

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing

Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing arXiv:2606.05950v1 Announce Type: new Abstract: Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foun…

impact 16

arXiv AI · 7h ago

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

The Self-Correction Illusion: LLMs Correct Others but Not Themselves arXiv:2606.05976v1 Announce Type: new Abstract: Recent work shows that LLM agents struggle to correct errors in their own reasoning traces yet show ma…

impact 16

arXiv AI · 7h ago

Framing, Judging, Steering: An Assessable Competency Model for Teach-ing Students to Reason With Generative AI

Framing, Judging, Steering: An Assessable Competency Model for Teach-ing Students to Reason With Generative AI arXiv:2606.05983v1 Announce Type: new Abstract: Generative AI makes answers easy and understanding hard, and…

impact 16

arXiv AI · 7h ago

Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs

Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs arXiv:2606.06003v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) fails systematically on q…

impact 16

arXiv AI · 7h ago

RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit

RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit arXiv:2606.06027v1 Announce Type: new Abstract: Community-conditioned language model adaptation requires choices about data collect…

impact 16

arXiv AI · 7h ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents arXiv:2606.06036v1 Announce Type: new Abstract: Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. Whil…

impact 16

arXiv AI · 7h ago

Beyond Similarity: Trustworthy Memory Search for Personal AI Agents

Beyond Similarity: Trustworthy Memory Search for Personal AI Agents arXiv:2606.06054v1 Announce Type: new Abstract: Personal AI agents increasingly rely on long-term memory to provide persistent personalization across s…

impact 16

arXiv AI · 7h ago

When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents

When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents arXiv:2606.06055v1 Announce Type: new Abstract: Long-term memory enables language model agents to support persona…

impact 16

arXiv AI · 7h ago

Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents arXiv:2606.06090v1 Announce Type: new Abstract: LLM-based agents increasingly tackle long-horizon tasks with interdependent deci…

impact 16

arXiv AI · 7h ago

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model arXiv:2606.06099v1 Announce Type: new Abstract: Whether Large Language Models (LLMs) exhibit covert psychological manipul…

impact 16

arXiv AI · 7h ago

Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems

Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems arXiv:2606.06114v1 Announce Type: new Abstract: Self-evolving agents improve through continual self-play a…

impact 16

arXiv AI · 7h ago

Amortizing Federated Adaptation: Hypernetwork Driven LoRA for Personalized Foundation Models

Amortizing Federated Adaptation: Hypernetwork Driven LoRA for Personalized Foundation Models arXiv:2606.06154v1 Announce Type: new Abstract: Federated fine-tuning of foundation models using Low-Rank Adaptation (LoRA) of…

impact 16

arXiv AI · 7h ago

Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains

Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains arXiv:2606.06201v1 Announce Type: new Abstract: Pharmaceutical supply chains (PSCs) strugg…

impact 16

arXiv AI · 7h ago

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents arXiv:2606.06284v1 Announce Type: new Abstract: Large language model agents increasingly rely on external tools, but larger tool menus can reduc…

impact 16

All Categories

Get AI & ML updates in your inbox