TheHackerNews
·
7d ago
Attackers Use LLM Agent for Post-Exploitation After Marimo CVE-2026-39987 Exploit An unknown threat actor has been observed using a large language model (LLM) agent to conduct post-compromise actions after obtaining ini…
SecurityWeek
·
21h ago
In Other News: Anthropic Maps AI Threats, Unpatched Comodo Flaw, Palantir Chief Eyed for CISA Other noteworthy stories that might have slipped under the radar: Ultrahuman data leak, The Gentlemen ransomware analysis, Ho…
DeepMind
·
32w ago
Introducing CodeMender: an AI agent for code security Using advanced AI to fix critical software vulnerabilities
arXiv AI
·
7h ago
Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models arXiv:2606.05429v1 Announce Type: new Abstract: Post-training quantization (PTQ) is critical for the efficient depl…
arXiv AI
·
7h ago
Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison arXiv:2606.05436v1 Announce Type: new Abstract: Summarizing the latest medical literatu…
arXiv AI
·
7h ago
Evaluating Agentic Configuration Repair for Computer Networks arXiv:2606.06212v1 Announce Type: new Abstract: Misconfigurations in computer networks remain a major source of critical Internet outages. Research is turnin…
DeepMind
·
25w ago
Deepening our partnership with the UK AI Security Institute Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research
arXiv AI
·
7h ago
GITCO: Gated Inference-Time Context Optimization in TSFMs arXiv:2606.05332v1 Announce Type: new Abstract: Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches c…
KrebsOnSecurity
·
3w ago
Patch Tuesday, May 2026 Edition Artificial intelligence platforms may be just as susceptible to social engineering as human beings, but they are proving remarkably good at finding security vulnerabilities in human-made…
Microsoft
·
11w ago
Announcing Copilot leadership update Satya Nadella, Chairman and CEO, and Mustafa Suleyman, Executive Vice President and CEO of Microsoft AI, shared the below communications with Microsoft employees this morning. SATYA…
NIST
·
1d ago
New AI Model Shows How to Evacuate for Fires One Safe Step at a Time A NIST-led team has created a new AI model that can identify safe evacuation routes in a single-story floor plan during a fire, with a multilevel vers…
NIST
·
20w ago
CAISI Issues Request for Information About Securing AI Agent Systems The Center for AI Standards and Innovation (CAISI) at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) has publ…
NIST
·
35w ago
CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks The Center for AI Standards and Innovation at NIST evaluated several leading models from DeepSeek, an AI company based in the People’s Republic of Chin…
Product Hunt
·
5d ago
Astra Autonomous Pentest AI agents that find, validate, and fix every vulnerability Discussion | Link
TheHackerNews
·
3h ago
AI Agent Uncovers 21 Zero-Days in FFmpeg; Chrome Patches Record 429 Bugs Two things landed within days of each other this week. A security startup reported 21 previously unknown vulnerabilities in FFmpeg, the media libr…
TheHackerNews
·
7d ago
ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface Cybersecurity researchers have disclosed details of a vulnerability in OpenAI ChatGPT that leverages the artificial intelligence (AI) assistan…
TheHackerNews
·
1d ago
ThreatsDay Bulletin: AI Agents Gone Wrong, Sketchy C2 Tools, ClickFix Tricks, JS Backdoors & 20+ New Stories It got stupid again. The internet still feels held together with tape.
arXiv AI
·
7h ago
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment arXiv:2606.05256v1 Announce Type: new Abstract: This study analyzes a publicly released dataset from a discontinued fie…
arXiv AI
·
7h ago
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems arXiv:2606.05304v1 Announce Type: new Abstract: Multi-agent systems (MAS) built on large language models are typically organized aroun…
arXiv AI
·
7h ago
SentinelBench: A Benchmark for Long-Running Monitoring Agents arXiv:2606.05342v1 Announce Type: new Abstract: AI agents are increasingly asked to carry out work that spans minutes, hours, or longer. Yet the default mode…
arXiv AI
·
7h ago
Synthetic Contrastive Reasoning for Multi-Table Q&A arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional r…
arXiv AI
·
7h ago
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges arXiv:2606.05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where m…
arXiv AI
·
7h ago
Residual Modeling for High-Fidelity Learned Compression of Scientific Data arXiv:2606.05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Lear…
arXiv AI
·
7h ago
Agents' Last Exam arXiv:2606.05405v1 Announce Type: new Abstract: Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment…
arXiv AI
·
7h ago
Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers arXiv:2606.05420v1 Announce Type: new Abstract: The rapid proliferation of hyperscale data centers (HDCs) in the US, mainly driven by…
arXiv AI
·
7h ago
Insurance of Agentic AI arXiv:2606.05449v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) systems are transforming the risk landscape by extending beyond information generation to autonomous planning,…
arXiv AI
·
7h ago
PSEBench: A Controllable and Verifiable Benchmark for Evaluating LLMs in Patient Safety Event Triage arXiv:2606.05463v1 Announce Type: new Abstract: Patient safety event triage, determining whether a clinical event is r…
arXiv AI
·
7h ago
Severity-Aware Curriculum Learning with Multi-Model Response Selection for Medical Text Generation arXiv:2606.05510v1 Announce Type: new Abstract: Telehealth systems have become increasingly important for delivering acc…
arXiv AI
·
7h ago
EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts arXiv:2606.05513v1 Announce Type: new Abstract: Epidemic LLM forecasters are usually trained and evaluated as static supervised mode…
arXiv AI
·
7h ago
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization arXiv:2606.05525v1 Announce Type: new Abstract: Recent advances in agentic visualization have enabled the translati…
arXiv AI
·
7h ago
When Should We Protect AI? A Precautionary Framework for Consciousness Uncertainty arXiv:2606.05528v1 Announce Type: new Abstract: Existing frameworks assess whether AI systems might be conscious but provide no guidance…
arXiv AI
·
7h ago
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations arXiv:2606.05563v1 Announce Type: new Abstract: Evaluating LLM mediators remains challenging, as m…
arXiv AI
·
7h ago
Multilingual Fine-Tuning via Localized Gradient Conflict Resolution arXiv:2606.05613v1 Announce Type: new Abstract: The rapid evolution of Large Language Models (LLMs) has established cross-lingual versatility as a defi…
arXiv AI
·
7h ago
Evaluation of LLMs for Mathematical Formalization in Lean arXiv:2606.05632v1 Announce Type: new Abstract: Within the past few years, the ability of Large Language Models (LLMs) to generate formal mathematical proofs has…
arXiv AI
·
7h ago
Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage? arXiv:2606.05647v1 Announce Type: new Abstract: AI coding agents are increasingly embedded in real-world software development, collaborating with human…
arXiv AI
·
7h ago
Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows arXiv:2606.05670v1 Announce Type: new Abstract: Does adding more agents help an LLM workflow once compared systems share the same be…
arXiv AI
·
7h ago
Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio arXiv:2606.05682v1 Announce Type: new Abstract: Demand for low-precision inference, including NVFP4-based approaches, has grown as large lang…
arXiv AI
·
7h ago
DiG-Plan: Mitigating Early Commitment for Tool-Graph Planning via Diffusion Guidance arXiv:2606.05728v1 Announce Type: new Abstract: Generating executable tool plans requires selecting appropriate subsets from tool libr…
arXiv AI
·
7h ago
When AI Says It Feels arXiv:2606.05734v1 Announce Type: new Abstract: Large language models (LLMs) are generally constrained from expressing feelings through human-preference alignment in post-training processes. This p…
arXiv AI
·
7h ago
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents arXiv:2606.05761v1 Announce Type: new Abstract: Persistent AI assistants, such as OpenClaw, accumulate large collecti…
arXiv AI
·
7h ago
From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents arXiv:2606.05805v1 Announce Type: new Abstract: LLM-based guardrails typically safeguard agents by evaluating pro…
arXiv AI
·
7h ago
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents arXiv:2606.05806v1 Announce Type: new Abstract: Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''hap…
arXiv AI
·
7h ago
Statistical Priors for Implicit Preferences: Decoupling Skill Selection as a Local Harness in Personal Agents arXiv:2606.05828v1 Announce Type: new Abstract: As Large Language Model (LLM) capabilities advance, locally d…
arXiv AI
·
7h ago
Agentic Molecular Recovery via Molecule-Aware Exploration arXiv:2606.05847v1 Announce Type: new Abstract: Text-guided molecular generation with LLMs often yields invalid SMILES. We argue that invalid drafts should be ad…
arXiv AI
·
7h ago
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving arXiv:2606.05875v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) improves large language model (LLM) answer quality by g…
arXiv AI
·
7h ago
A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR arXiv:2606.05932v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards (RLVR) improves reasoning even w…
arXiv AI
·
7h ago
Edit-R2: Context-Aware Reinforcement Learning for Multi-Turn Image Editing arXiv:2606.05950v1 Announce Type: new Abstract: Text-guided image editing has advanced rapidly with diffusion models and unified multimodal foun…
arXiv AI
·
7h ago
The Self-Correction Illusion: LLMs Correct Others but Not Themselves arXiv:2606.05976v1 Announce Type: new Abstract: Recent work shows that LLM agents struggle to correct errors in their own reasoning traces yet show ma…
arXiv AI
·
7h ago
Framing, Judging, Steering: An Assessable Competency Model for Teach-ing Students to Reason With Generative AI arXiv:2606.05983v1 Announce Type: new Abstract: Generative AI makes answers easy and understanding hard, and…
arXiv AI
·
7h ago
Beyond Vector Similarity: A Structural Analysis of Graph-Augmented Retrieval for Industrial Knowledge Graphs arXiv:2606.06003v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) fails systematically on q…
arXiv AI
·
7h ago
RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit arXiv:2606.06027v1 Announce Type: new Abstract: Community-conditioned language model adaptation requires choices about data collect…
arXiv AI
·
7h ago
Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents arXiv:2606.06036v1 Announce Type: new Abstract: Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. Whil…
arXiv AI
·
7h ago
Beyond Similarity: Trustworthy Memory Search for Personal AI Agents arXiv:2606.06054v1 Announce Type: new Abstract: Personal AI agents increasingly rely on long-term memory to provide persistent personalization across s…
arXiv AI
·
7h ago
When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents arXiv:2606.06055v1 Announce Type: new Abstract: Long-term memory enables language model agents to support persona…
arXiv AI
·
7h ago
Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents arXiv:2606.06090v1 Announce Type: new Abstract: LLM-based agents increasingly tackle long-horizon tasks with interdependent deci…
arXiv AI
·
7h ago
CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model arXiv:2606.06099v1 Announce Type: new Abstract: Whether Large Language Models (LLMs) exhibit covert psychological manipul…
arXiv AI
·
7h ago
Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems arXiv:2606.06114v1 Announce Type: new Abstract: Self-evolving agents improve through continual self-play a…
arXiv AI
·
7h ago
Amortizing Federated Adaptation: Hypernetwork Driven LoRA for Personalized Foundation Models arXiv:2606.06154v1 Announce Type: new Abstract: Federated fine-tuning of foundation models using Low-Rank Adaptation (LoRA) of…
arXiv AI
·
7h ago
Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains arXiv:2606.06201v1 Announce Type: new Abstract: Pharmaceutical supply chains (PSCs) strugg…
arXiv AI
·
7h ago
ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents arXiv:2606.06284v1 Announce Type: new Abstract: Large language model agents increasingly rely on external tools, but larger tool menus can reduc…