Artificial Intelligence Reddit

Artificial Intelligence Reddit’s home for Artificial Intelligence

Artificial Intelligence (AI) Reddit’s home for Artificial Intelligence (AI)

  • What’s next if AGI does not happen?
    by /u/BubblyOption7980 on January 13, 2026 at 11:33 am

    Is all the talk about robotics, automated vehicles, and world models an acknowledgement that the LLM scaling era has plateaued? Is it time to focus on more realistic use cases than the AGI / Super-intelligence hype? submitted by /u/BubblyOption7980 [link] [comments]

  • European banks plan to cut 200,000 jobs as AI takes hold | TechCrunch
    by /u/msaussieandmrravana on January 13, 2026 at 9:58 am

    If AI does not improve human lives, who needs it? submitted by /u/msaussieandmrravana [link] [comments]

  • I bought an LG TV for the first time in my life, and it’s weird.
    by /u/a_decent_hooman on January 13, 2026 at 9:01 am

    It has its own AI bot and Alexa and Microsoft Copilot. Do I need them all at the same time? I just don’t understand. None of them are removable. submitted by /u/a_decent_hooman [link] [comments]

  • I bought an LG TV for the first time in my life, and it’s weird.
    by /u/a_decent_hooman on January 13, 2026 at 9:01 am

    It has its own AI bot and Alexa and Microsoft Copilot. Do I need them all at the same time? I just don’t understand. None of them are removable. submitted by /u/a_decent_hooman [link] [comments]

  • Malaysia and Indonesia become the first countries to block Musk’s Grok over sexualized AI images
    by /u/APnews on January 13, 2026 at 8:24 am

    submitted by /u/APnews [link] [comments]

  • I treated job hunting and interviewing like a second job… so I built a lazy AI workflow
    by /u/Ok_Improvement7802 on January 13, 2026 at 7:49 am

    I used to prep by panic googling at midnight and it often took my whole evening. Now I do this lazy AI workflow before interviews Perplexity – search what happened with this company in the last 6 months? what are 3 risks they’re facing? Just give me actual talking points. ChatGPT – based on this JD, give me 5 likely questions + STAR outline prompts. Glean – I drop my notes in there so it becomes searchable later. Like what did I learn about X company last time? helps when having multiple interviews and my brain turned to soup. Coco career AI – honestly it helps before interviews: because the jobs it recommends to me are more aligned. submitted by /u/Ok_Improvement7802 [link] [comments]

  • Please Help! My father is being scammed!
    by /u/True_CrimePodcast on January 13, 2026 at 7:37 am

    The woman in the video is Larissa Liveir, a Brazilian Guitarist. She’s sponsored by Gibson. I’m not sure if the video was created with ai or not. The video was sent to my 70 year old father from a scammer pretending to be her. I know the voice is not hers. First she’s Brazilian and her native language is Portuguese. The real Larissa Liveir does speak English but I assume with a heavy accent. There’s no accent in this. Can someone please tell me if the video is AI? submitted by /u/True_CrimePodcast [link] [comments]

  • chatgpt vs claude opus 4.5: coding performance breakdown (building a business website)
    by /u/Significant_Loss_541 on January 13, 2026 at 7:15 am

    While working on a business website i needed to figure out which model actually handles complex coding stuff better. So i ran some spatial reasoning tests on chatgpt o4 and claude opus 4.5 to see how they deal with messy legacy code and refactoring. Basically fed both models some old code with tons of nested dependencies, asked them to refactor, identify bugs, suggest better architecture. Did this over 15 different scenarios and tracked accuracy, context handling, token usage to get a real picture.. On 500+ line files, claude was hitting ~85% accurate bug detection while chatgpt o4 was around 72%. Refactoring quality had a bigger gap – claude gave usable results ~78% of the time vs chatgpt’s 65%. the thing that really stood out was context retention. Claude handled 8-10 files no problem, chatgpt started losing track after 5-6 especially with heavy cross-references. Token efficiency went to claude too, ~120k tokens per full run vs chatgpt’s 180k for the same task. Claude’s just noticeably better at the spatial reasoning side of code architecture, chatgpt loses dependency chains quicker when everything references everything else. While digging around i came across qwen3 coder 480b on deepinfra – apparently solid benchmarks for agentic coding tasks and performance pretty comparable to claude. Keeping it on the list to try later, but we’re already hooked up with claude and it’s working good enough right now. submitted by /u/Significant_Loss_541 [link] [comments]

  • It’s been a big week for Agentic AI ; Here are 10 massive developments you might’ve missed:
    by /u/SolanaDeFi on January 13, 2026 at 7:02 am

    OpenAI launches Health and Jobs agents Claude Code 2.1.0 drops with 1096 commits Cursor agent reduces tokens by 47% A collection of AI Agent Updates! (yes made by me, a human, lmao)🧵 1. Claude Code 2.1.0 Released with Major Agent Updates 1096 commits shipped. Add hooks to agents & skills frontmatter, agents no longer stop on denied tool use, custom agent support, wildcard tool permissions, and multilingual support. Huge agentic workflow improvements. 2. OpenAI Launches ChatGPT Health Agent Dedicated space for health conversations. Securely connect medical records and wellness apps so responses are grounded in your health data. Designed to help navigate medical care, not replace it. Early access waitlist open. The personal health agent is now available. 3. Cursor Agent Implements Dynamic Context More intelligent context filling across all models while maintaining same quality. Reduces total tokens by 46.9% when using multiple MCP servers. Their agent efficiency is now dramatically improved. 4. Firecrawl Adds GitHub Search for Agents Set category: “github” on /search to get repos, starter kits, and open source projects with structured data in one call. Available in playground, API, and SDKs. Agents can now search GitHub programmatically. 5. Anthropic Publishes Guide on Evaluating AI Agents New engineering blog post: “Demystifying evals for AI agents.” Shares evaluation strategies from real-world deployments. Addresses why agent capabilities make them harder to evaluate. Best practices for agent evaluation released. 6. Tailwind Lays Off 75% of Team Due to AI Agent Usage CSS framework became extremely popular with AI coding agents (75M downloads/mo). But agents don’t visit docs where they promoted paid offerings. Result: 40% traffic drop, 80% revenue loss. Proves agents can disrupt business models. 7. Cognition Partners with Infosys to Deploy Devin AI Agent Infosys rolling out Devin across engineering organization and global client base. Early results show significant productivity gains, including complex COBOL migrations completed in record time. New enterprise deployment for coding agents. 8. ERC-8004 Proposal: Trustless AI Agents onchain New proposal enables agents from different orgs to interact without pre-existing trust. Three registries: Identity (unique identifiers), Reputation (scoring system), Verification (independent validator checks). Infra for cross-organizational agent interaction. 9. Early Look at Grok Build Coding Agent from xAI Vibe coding solution arriving as CLI tool with web UI support on Grok. Initially launching as local agent with CLI interface. Remote coding agents planned for later. xAI entering coding agent competition. 10. OpenAI Developing ChatGPT Jobs Career Agent Help with resume tips, job search, and career guidance. Features: resume improvement and positioning, role exploration, job search and comparison. Follows ChatGPT Health launch. What will they build once Health and Jobs are complete? That’s a wrap on this week’s Agentic news. Which update impacts you the most? LMK what else you want to see | More weekly AI + Agentic content releasing ever week! submitted by /u/SolanaDeFi [link] [comments]

  • One-Minute Daily AI News 1/12/2026
    by /u/Excellent-Target-847 on January 13, 2026 at 5:48 am

    Apple teams up with Google Gemini for AI-powered Siri.[1] Anthropic announces Claude for Healthcare following OpenAI’s ChatGPT Health reveal.[2] Hyundai shows off K-pop dancing robot dogs and humanoid robot Atlas at CES.[3] Google announces a new protocol to facilitate commerce using AI agents.[4] Sources: [1] https://www.mercurynews.com/2026/01/12/apple-teams-up-with-google-gemini-for-ai-powered-siri/ [2] https://techcrunch.com/2026/01/12/anthropic-announces-claude-for-healthcare-following-openais-chatgpt-health-reveal/ [3] https://www.youtube.com/watch?v=G7oCXL4VxSE [4] https://techcrunch.com/2026/01/11/google-announces-a-new-protocol-to-facilitate-commerce-using-ai-agents/ submitted by /u/Excellent-Target-847 [link] [comments]

  • Anthropic Cowork Launches: Claude Code Without Coding Skills
    by /u/i-drake on January 13, 2026 at 4:54 am

    submitted by /u/i-drake [link] [comments]

  • Pentagon is embracing Musk’s Grok AI chatbot as it draws global outcry
    by /u/esporx on January 13, 2026 at 4:41 am

    submitted by /u/esporx [link] [comments]

  • Cowork: Claude Code for the rest of your work
    by /u/eternviking on January 12, 2026 at 8:17 pm

    submitted by /u/eternviking [link] [comments]

  • The Intelligence Paradox: Why centralized AI is hitting a “Power Wall” and the case for decentralized inference hubs
    by /u/Foreign-Job-8717 on January 12, 2026 at 7:14 pm

    As we scale to GPT-5.2 and beyond, the energy footprint of centralized data centers in the US is becoming a physical limit. I’m theorizing that the next step isn’t “bigger models,” but smarter routing to specialized, regionally-hosted inference hubs. If we can’t shrink the models, we must optimize the path to the user. I’m curious about the community’s take on “Inference-at-the-edge” for LLMs. Is the future a single global brain, or a fragmented network of sovereign AI nodes? submitted by /u/Foreign-Job-8717 [link] [comments]

  • The bottleneck isn’t AI capability anymore. It’s human reception.
    by /u/Signal_Usual8630 on January 12, 2026 at 6:27 pm

    Somewhere between GPT-3.5 and Claude 3, something shifted. AI capability stopped being the constraint. The new bottleneck: Can humans understand enough to decide with confidence? After 416K messages over 2.5 years, I packaged this thesis into a “seed” — a JSON you paste into any LLM. Type “unpack” and explore 17 themes at your own pace. The singularity can’t happen. Not because AI isn’t smart enough. Because humans won’t use what they can’t verify. https://github.com/mordechaipotash/thesis submitted by /u/Signal_Usual8630 [link] [comments]

  • Multimodal LLMs are the real future of AI (especially for robotics)
    by /u/Upset-Pop1136 on January 12, 2026 at 11:22 am

    I strongly believe multimodal LLMs (AI that can understand text, images, audio, and actions) are the next big step in AI. Right now, most LLMs are mainly used for chatting. But I think the real breakthrough will happen in robotics, where AI needs to see, hear, and act in the real world. Think about it: Every robot already has (or will have) sensors: Cameras (drones, vehicles, humanoid robots) Microphones Depth sensors / LiDAR GPS / IMU Maybe even tactile sensors A robot doesn’t just need to talk, it needs to: see the world understand scenes reason about physical space plan actions and execute in real-time And multimodal models are basically built for this. I feel like as robotics advances accelerate, the demand for multimodal intelligence is going to explode, because robots are not operating inside a browser, they’re operating in the real world. I’m building in this space. What’s your opinion on the future of multimodal LLMs? submitted by /u/Upset-Pop1136 [link] [comments]

  • What is something current AI systems are very good at, but people still don’t trust them to do?
    by /u/seenmee on January 12, 2026 at 4:07 am

    We see benchmarks and demos showing strong performance, but hesitation still shows up in real use. Curious where people draw the trust line and why, whether it’s technical limits, incentives, or just human psychology. submitted by /u/seenmee [link] [comments]

  • I built Plano – the framework-agnostic runtime data plane for agentic applications
    by /u/AdditionalWeb107 on January 12, 2026 at 12:21 am

    Thrilled to be launching Plano today – delivery infrastructure for agentic apps: An edge and service proxy server with orchestration for AI agents. Plano’s core purpose is to offload all the plumbing work required to deliver agents to production so that developers can stay focused on core product logic. Plano runs alongside your app servers (cloud, on-prem, or local dev) deployed as a side-car, and leaves GPUs where your models are hosted. The problem On the ground AI practitioners will tell you that calling an LLM is not the hard part. The really hard part is delivering agentic applications to production quickly and reliably, then iterating without rewriting system code every time. In practice, teams keep rebuilding the same concerns that sit outside any single agent’s core logic: This includes model agility – the ability to pull from a large set of LLMs and swap providers without refactoring prompts or streaming handlers. Developers need to learn from production by collecting signals and traces that tell them what to fix. They also need consistent policy enforcement for moderation and jailbreak protection, rather than sprinkling hooks across codebases. And they need multi-agent patterns to improve performance and latency without turning their app into orchestration glue. These concerns get rebuilt and maintained inside fast-changing frameworks and application code, coupling product logic to infrastructure decisions. It’s brittle, and pulls teams away from core product work into plumbing they shouldn’t have to own. What Plano does Plano moves core delivery concerns out of process into a modular proxy and dataplane designed for agents. It supports inbound listeners (agent orchestration, safety and moderation hooks), outbound listeners (hosted or API-based LLM routing), or both together. Plano provides the following capabilities via a unified dataplane: – Orchestration: Low-latency routing and handoff between agents. Add or change agents without modifying app code, and evolve strategies centrally instead of duplicating logic across services. – Guardrails & Memory Hooks: Apply jailbreak protection, content policies, and context workflows (rewriting, retrieval, redaction) once via filter chains. This centralizes governance and ensures consistent behavior across your stack. – Model Agility: Route by model name, semantic alias, or preference-based policies. Swap or add models without refactoring prompts, tool calls, or streaming handlers. – Agentic Signals™: Zero-code capture of behavior signals, traces, and metrics across every agent, surfacing traces, token usage, and learning signals in one place. The goal is to keep application code focused on product logic while Plano owns delivery mechanics. More on Architecture Plano has two main parts: Envoy-based data plane. Uses Envoy’s HTTP connection management to talk to model APIs, services, and tool backends. We didn’t build a separate model server—Envoy already handles streaming, retries, timeouts, and connection pooling. Some of us are core Envoy contributors at Katanemo. Brightstaff, a lightweight controller and state machine written in Rust. It inspects prompts and conversation state, decides which agents to call and in what order, and coordinates routing and fallback. It uses small LLMs (1–4B parameters) trained for constrained routing and orchestration. These models do not generate responses and fall back to static policies on failure. The models are open sourced here: https://huggingface.co/katanemo submitted by /u/AdditionalWeb107 [link] [comments]

  • China is closing in on US technology lead despite constraints, AI researchers say
    by /u/esporx on January 11, 2026 at 11:08 pm

    submitted by /u/esporx [link] [comments]

  • Song detection including release date
    by /u/stickywinger on January 11, 2026 at 5:31 pm

    I have an old collection of music around 20-30yo on my hard drive and some of it is unnamed or other missing info. I’ve slowly started sorting through but by far the most time consuming thing is either trying to find the artist and title or the release date manually. (not all of them are unnamed/undated, but a good chunk) Is there any AI or something like that, that can scan my file explorer and find/rename/date etc the tracks? I’d also be happy to scan them 1 by 1 if it meant I can find the correct info for them. submitted by /u/stickywinger [link] [comments]

  • What’s your wild take on the rise of AI?
    by /u/milicajecarrr on January 11, 2026 at 3:02 pm

    We have entered an era of AI doing _almost_ anything. From vibe coding, to image/video creation, new age of SEO, etc etc… But what do you think AI is going to be able to do in the near future? Just a few years ago we were laughing at people saying AI will be able to make apps, for example, or do complex mathematical calculation, and here we are haha So what’s your “wild take” some people might laugh at, but it’s 100% achievable in the future? submitted by /u/milicajecarrr [link] [comments]

  • One-Minute Daily AI News 1/10/2026
    by /u/Excellent-Target-847 on January 11, 2026 at 5:50 am

    Meta signs nuclear energy deals to power Prometheus AI supercluster.[1] OpenAI is reportedly asking contractors to upload real work from past jobs.[2] Meta and Harvard Researchers Introduce the Confucius Code Agent (CCA): A Software Engineering Agent that can Operate at Large-Scale Codebases.[3] X could face UK ban over deepfakes, minister says.[4] Sources: [1] https://www.cnbc.com/2026/01/09/meta-signs-nuclear-energy-deals-to-power-prometheus-ai-supercluster.html [2] https://techcrunch.com/2026/01/10/openai-is-reportedly-asking-contractors-to-upload-real-work-from-past-jobs/ [3] https://www.marktechpost.com/2026/01/09/meta-and-harvard-researchers-introduce-the-confucius-code-agent-cca-a-software-engineering-agent-that-can-operate-at-large-scale-codebases/ [4] https://www.bbc.com/news/articles/c99kn52nx9do submitted by /u/Excellent-Target-847 [link] [comments]

  • Alignment tax isn’t global: a few attention heads cause most capability loss
    by /u/FinnFarrow on January 10, 2026 at 6:47 pm

    submitted by /u/FinnFarrow [link] [comments]

  • Geoffrey Hinton says LLMs are no longer just predicting the next word – new models learn by reasoning and identifying contradictions in their own logic. This unbounded self-improvement will “end up making it much smarter than us.”
    by /u/MetaKnowing on January 10, 2026 at 5:54 pm

    submitted by /u/MetaKnowing [link] [comments]

  • A deep dive into how I trained an edit model to show highly relevant code suggestions while programming
    by /u/National_Purpose5521 on January 10, 2026 at 5:48 pm

    This is def interesting for all SWEs who would like to know what goes behind the scenes in your code editor when you hit `Tab`. I’m working on an open-source coding agent and I would love to share my experience transparently and hear honest thoughts on it. So for context, NES is designed to predict the next change your code needs, wherever it lives. Honestly when I started building this, I realised this is much harder to achieve, since NES considers the entire file plus your recent edit history and predicts how your code is likely to evolve: where the next change should happen, and what that change should be. Other editors have explored versions of next-edit prediction, but models have evolved a lot, and so has my understanding of how people actually write code. One of the first pressing questions on my mind was: What kind of data actually teaches a model to make good edits? It turned out that real developer intent is surprisingly hard to capture. As anyone who’s peeked at real commits knows, developer edits are messy. Pull requests bundle unrelated changes, commit histories jump around, and the sequences of edits often skip the small, incremental steps engineers actually take when exploring or fixing code. To train an edit model, I formatted each example using special edit tokens. These tokens are designed to tell the model: – What part of the file is editable – The user’s cursor position – What the user has edited so far – What the next edit should be inside that region only Unlike chat-style models that generate free-form text, I trained NES to predict the next code edit inside the editable region. So for eg, when the developer makes the first edit it allows the model to capture the intent of the user. The `editable_region` markers define everything between them as the editable zone. The `user_cursor_is_here` token shows the model where the user is currently editing. NES infers the transformation pattern (capitalization in this case) and applies it consistently as the next edit sequence. To support this training format, I used CommitPackFT and Zeta as data sources. I normalized this unified dataset into the same Zeta-derived edit-markup format as described above and applied filtering to remove non-sequential edits using a small in-context model (GPT-4.1 mini). Now that I had the training format and dataset finalized, the next major decision was choosing what base model to fine-tune. Initially, I considered both open-source and managed models, but ultimately chose Gemini 2.5 Flash Lite for two main reasons: – Easy serving: Running an OSS model would require me to manage its inference and scalability in production. For a feature as latency-sensitive as Next Edit, these operational pieces matter as much as the model weights themselves. Using a managed model helped me avoid all these operational overheads. – Simple supervised-fine-tuning: I fine-tuned NES using Google’s Gemini Supervised Fine-Tuning (SFT) API, with no training loop to maintain, no GPU provisioning, and at the same price as the regular Gemini inference API. Under the hood, Flash Lite uses LoRA (Low-Rank Adaptation), which means I need to update only a small set of parameters rather than the full model. This keeps NES lightweight and preserves the base model’s broader coding ability. Overall, in practice, using Flash Lite gave me model quality comparable to strong open-source baselines, with the obvious advantage of far lower operational costs. This keeps the model stable across versions. And on the user side, using Flash Lite directly improves the user experience in the editor. As a user, you can expect faster responses and likely lower compute cost (which can translate into cheaper product). And since fine-tuning is lightweight, I can roll out frequent improvements, providing a more robust service with less risk of downtime, scaling issues, or version drift; meaning greater reliability for everyone. Next, I evaluated the edit model using a single metric: LLM-as-a-Judge, powered by Gemini 2.5 Pro. This judge model evaluates whether a predicted edit is semantically correct, logically consistent with recent edits, and appropriate for the given context. This is unlike token-level comparisons and makes it far closer to how a human engineer would judge an edit. In practice, this gave me an evaluation process that is scalable, automated, and far more sensitive to intent than simple string matching. It allowed me to run large evaluation suites continuously as I retrain and improve the model. But training and evaluation only define what the model knows in theory. To make Next Edit Suggestions feel alive inside the editor, I realised the model needs to understand what the user is doing right now. So at inference time, I give the model more than just the current file snapshot. I also send – User’s recent edit history: Wrapped in `<|edit_history|>`, this gives the model a short story of the user’s current flow: what changed, in what order, and what direction the code seems to be moving. – Additional semantic context: Added via `<|additional_context|>`, this might include type signatures, documentation, or relevant parts of the broader codebase. It’s the kind of stuff you would mentally reference before making the next edit. The NES combines these inputs to infer the user’s intent from earlier edits and predict the next edit inside the editable region only. I’ll probably write more into how I constructed, ranked, and streamed these dynamic contexts. But would love to hear feedback and is there anything I could’ve done better submitted by /u/National_Purpose5521 [link] [comments]

Share Websitecyber
We are an ethical website cyber security team and we perform security assessments to protect our clients.