llm · 相关事件
相关事件
FALSIFYBENCH: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety
2026-06-04OPEN_SOURCE影响: MEDIUM
AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety
2026-06-04BREAKTHROUGH影响: HIGH
CodegenBench: Can LLMs Write Efficient Code Across Architectures?
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
The Biomimetic Architecture of Software 4.0
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Unpredictable Safety: Domain-Dependent Compliance and the Transparency Gap in Open-Weight LLMs
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Unpredictable Safety: Domain-Dependent Compliance and the Transparency Gap in Open-Weight LLMs
2026-06-04REGULATION影响: MEDIUM
LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
LLM Compression with Jointly Optimizing Architectural and Quantization choices
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Plan First, Judge Later, Run Better: A DMAIC-Inspired Agentic System for Industrial Anomaly Detection
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Parthenon Law: A Self-Evolving Legal-Agent Framework
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
A Normative Intermediate Representation for ASP-Based Compliance Reasoning
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
A Normative Intermediate Representation for ASP-Based Compliance Reasoning
2026-06-04REGULATION影响: MEDIUM
dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Semantic Constraint Synthesis for Adaptive Trajectory Optimization via Large Language Models
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Dual Advantage Fields
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Rollout-Level Advantage-Prioritized Experience Replay for GRPO
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Ekka: Automated Diagnosis of Silent Errors in LLM Inference
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
NoRA: Evaluating Grounded Reasonableness in Visual First-person Normative Action Reasoning
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Learning Empirically Admissible Neural Heuristics for Combinatorial Search
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Provably Auditable and Safe LLM Agents from Human-Authored Ontologies
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Provably Auditable and Safe LLM Agents from Human-Authored Ontologies
2026-06-04REGULATION影响: MEDIUM
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
From Agent Traces to Trust: Evidence Tracing and Execution Provenance in LLM Agents
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
DAR: Deontic Reasoning with Agentic Harnesses
2026-06-04PRODUCT_LAUNCH影响: MEDIUM