Kimi-Linear-48B-A3B uses linear attention in a 48B model with only 3B active params now available as GGUF; TEAM accelerates MoE diffusion LLMs via temporal-spatial expert activation; OneVision-Encoder proposes codec-aligned sparsity for multimodal models.
Kimi-Linear-48B-A3B-Instruct GGUF release shows linear attention models reaching local deployment — next bottleneck is quantization-aware kernel support in llama.cpp for non-softmax attention variants.
3 sources
Step-3.5-Flash praised as strong for its size (140 upvotes); separate post (515 upvotes, 231 comments) discusses negative outlook for local LLM community, suggesting tension between cloud and local deployment economics.
2 sources
- reddit Step-3.5-Flash IS A BEAST 140pts
- reddit Bad news for local bros 515pts
Efficient-SAM2 accelerates SAM2 with object-aware visual encoding and memory retrieval for real-time video; SAM3 node update adds text-prompt detection and background removal in ComfyUI workflows.
2 sources
- arxiv Efficient-SAM2: Accelerating SAM2 with Object-Aware...
- reddit SAM3-nOde uPdate 117pts
Data exfiltration from messaging app agents via URL previews demonstrated; MUZZLE proposes agentic red-teaming of web agents against indirect prompt injection; StealthRL uses RL to evade multiple AI-text detectors simultaneously.
Prompt injection attacks now demonstrated against deployed agent products (OpenClaw example) — next bottleneck is that defenses require input sanitization at the tool-call boundary, which no major agent framework standardizes yet.
3 sources
Qwen3-Coder-Next praised as best general-purpose model at its size (530 upvotes), Qwen3.5 support merged in llama.cpp, abliterated GGUF variant published with 4865 downloads, and Qwen-Image-Edit LoRA trained for image style transfer.
Qwen3.5 llama.cpp merge and abliterated GGUFs already shipping — next bottleneck is whether Qwen3.5 quantized variants maintain quality parity with full-precision on reasoning benchmarks.
4 sources
- reddit Do not Let the "Coder" in Qwen3-Coder-Next Fool You!... 530pts
- reddit Qwen3.5 Support Merged in llama.cpp 234pts
- huggingface bartowski/huihui-ai_Qwen3-Coder-Next-abliterated-GGUF
- reddit Coloring Book Qwen Image Edit LoRA 461pts
Self-supervised bootstrapping replaces rigid CoT templates in VLA models; dexterous manipulation policies learned from RGB human videos via 3D hand-object trajectory reconstruction; χ₀ addresses distributional inconsistencies as the primary bottleneck in long-horizon robotic manipulation.
χ₀ identifies distributional inconsistency (not data scale) as the primary bottleneck for reliable long-horizon manipulation — next step is whether self-supervised CoT bootstrapping can close sim-to-real transfer gaps without domain-specific templates.
Reddit discussion (38 upvotes, 43 comments) questions whether autoregressive video world models are the right foundation for robot control; Dreaming in Code uses foundation models to programmatically generate curriculum environments for open-ended learning.
2 sources
Paper compares prompt-based vs agent-based approaches for automating computational reproducibility in social science; Agentseed generates AGENTS.md files from codebases to help AI coding agents understand repos.
Paper explores structured context engineering for SQL schemas up to 10,000 tables across models; separate discussion identifies offline/async LLM workloads (eval pipelines, dataset labeling) as highest-volume use cases rather than latency-sensitive ones.
Three papers independently address visual reasoning with structured intermediate steps: process reward models for thinking-with-images, annotation-free hierarchical synthetic CoT for VLMs, and adaptive test-time scaling with world models for spatial reasoning.
CoTZero eliminates annotation dependency for visual CoT and process reward models now evaluate intermediate visual reasoning steps — next bottleneck is scaling test-time compute adaptively without fixed step budgets.