07-24-Daily AI News Daily
AI Daily News 2025/7/24
AI Daily
|Updated at 8 AM
|Aggregated Data Across the Web
|Frontier Science Exploration
|Industry Free Expression
|Open Source Innovation
|AI and Human Future
| Visit Web Version
AI Product Showcase: GeminiCli2API
GeminiCli2API offers the perfect solution if you’ve ever felt constrained by the strict rate limits of Google Gemini’s official free API, or if you’ve been itching to seamlessly integrate Gemini’s powerful capabilities into your favorite third-party apps! 🎉
This clever local proxy, GeminiCli2API, wraps the more lenient Gemini CLI into a standard, OpenAI-compatible API service. This means you can finally break through the official free API’s rate limits, enjoying higher request quotas thanks to your Google account authorization. Now, you can develop, test, and create to your heart’s content, saying goodbye to those annoying “Quota Exceeded” errors!
But the real magic of GeminiCli2API lies in its “surgical-level” control over System Prompts. This feature is a game-changer:
- ✍️ Override: You can set a global “golden prompt” to force all connected applications to use it, ensuring absolute uniformity in AI roles and output styles.
- ➕ Append: You can subtly “append” an extra layer of your instructions to the client’s original system prompt, allowing for fine-tuning rules and enhancing capabilities without the client even knowing.
- 🔍 Extract & Audit: Easily log all prompts passing through the proxy, making it simple to analyze, debug, and optimize, or even build your own high-quality datasets.
With just a few simple configuration steps, you can connect any OpenAI-compatible tool, like LobeChat or NextChat, to this local “enhanced” Gemini service. GeminiCli2API isn’t just a proxy; it’s a powerful toolbox in your hands to master and tame AI. Go ahead, give it a try! ✨
AI Content Summary
Kai-Fu Lee launched AI agent "Wanzai"; Google released a faster, cheaper new model.
Kuaishou and Shanghai Jiao Tong University open-sourced multimodal model Orthus; Kunlun Wanwei upgraded its AI music platform.
Frontier research focuses on breaking large model context limits and enhancing AI's long-range reasoning capabilities.
In industry news, Amazon Web Services disbanded its AI research institute in Shanghai.
Meanwhile, AI has sparked data privacy ethical controversies and widespread AI anxiety in the workplace.
AI Product and Feature Updates
It’s a big deal! “Wanzai”, the first enterprise-grade AI agent from Zero-One Technology, steered by Kai-Fu Lee, has officially been unveiled. This isn’t just another chatty chatbot; it’s precisely positioned as a “super employee” capable of deep thinking, autonomous planning, and executing complex tasks. By seamlessly integrating with vast internal knowledge bases and critical external services, “Wanzai” aims for a stunning transformation from a passive “tool that takes orders” to an active “decision-maker that delivers results.” Kai-Fu Lee confidently predicts that AI agents are evolving from performing simple workflows (L1) to reasoning agents with autonomous planning capabilities (L2), ultimately moving towards a grand vision where multiple AIs collaborate to completely reshape corporate operations (L3). Looks like your cubicle buddy in the future might really not be human anymore! This industry shift is what this AI News deeply tracks.
Google has unleashed another big gun! They’ve officially released the stable version of Gemini 2.5 Flash-Lite, proudly claiming it’s their fastest and lowest-cost AI model to date – a perfect mediator between performance and your wallet ✨. This new model doesn’t just strike an incredible balance between performance and cost; it also natively supports an astonishing context length of up to 1 million tokens, making it a super chatty, memory-hoarding genius. What’s even more enticing is its highly competitive pricing strategy: just $0.10 per million input tokens, undeniably launching a fierce price war against all competitors. Developers, are you ready for this sweeping value-for-money storm? Friendly reminder: the old preview alias will officially be deprecated on August 25th, so make sure to update your code ASAP to avoid service interruptions!
What happens when a short-video giant meets a top-tier university? The answer is Orthus! Kuaishou and Shanghai Jiao Tong University jointly unveiled this new multimodal model named Orthus at the top-tier International Conference on Machine Learning (ICML), generously open-sourcing it for global developers. This newcomer, built on an advanced autoregressive Transformer architecture, not only freely navigates and excels across text and image modalities but also surpasses predecessors like Chameleon in various mainstream image understanding benchmarks with astonishing computational efficiency. What’s even more jaw-dropping is its ability to defeat SDXL, a heavyweight model specifically designed for image generation, on the text-to-image metric, truly making it a gifted cross-domain prodigy. This breakthrough undoubtedly declares that the boundaries of multimodal AI are far wider and vaster than we imagined, with future possibilities being simply limitless.
The domestic AI music scene is buzzing again! Kunlun Wanwei’s AI music creation platform, Mureka, has received a major V7 version upgrade. Its overall performance has now surpassed the popular overseas Suno app in several key dimensions, showcasing serious technical prowess 🎶. The biggest highlight of the new version is its self-developed music Chain of Thought technology, “MusiCoT.” This innovative tech allows the AI, before starting creation, to “deeply ponder” the entire song’s structure, mood, and melodic direction, much like a human composer, resulting in more coherent and emotionally richer musical pieces. Users can not only generate songs with simple text descriptions but also upload audio samples to mimic specific vocal tones, and even generate a “folksy” style music video with a single click – entertainment level maxed out! From this in-depth review - AI News, it’s clear that AI music is confidently moving from the early “listenable” stage to an advanced stage of being “pleasant” and truly infectious. The future music creation ecosystem will undoubtedly become more diverse and exciting because of it.
Still racking your brain trying to explain abstract concepts like “bubble sort” or “entropy increase” to students or clients? Worry not, a savior has arrived! A revolutionary AI animation engine called Fogsight has burst onto the scene, dedicated to tackling all kinds of abstruse abstract concepts 🤔. Users just need to input a keyword, and Fogsight works its magic, automatically generating a professional educational animation with complete narrative logic, excellent visual effects, and even thoughtfully provided bilingual narration. This powerful tool, built on advanced large language models, not only enables one-click smart generation but also offers a convenient conversational interface for easy fine-tuning and modification. What’s even more exciting is that as part of the renowned WaytoAGI Open Source Project - AI News, it fully supports local deployment, providing educators and content creators worldwide with an unprecedented super tool that can truly revolutionize traditional creative workflows.
AI Frontier Research
For a long time, research into semantic segmentation for images and videos in AI has been like two parallel lines that never meet. Researchers worked in silos, lacking a unified theoretical framework, which undoubtedly hampered the development of general vision technology. Now, that situation has finally been broken! Researchers from several top universities have teamed up to propose the first framework capable of unifying these two heterogeneous data types: QuadMix. At its core is a highly creative “Four-way mixing” mechanism, which cleverly constructs rich and diverse intermediate domain representations between the source and target data domains, effectively reducing the vast differences in cross-domain learning. This research is incredibly significant; it has not only theoretically unified previously fractured research paths but has also set new records - AI News in multiple industry standard benchmarks, laying a solid foundation for building more versatile and powerful multimodal perception systems in the future.
The limited context window of large language models (LLMs) has always been their “Achilles’ heel” when tackling complex long-range reasoning tasks, severely restricting their deep thinking capabilities. However, a paper titled “Beyond Context Limits: Subconscious Clues for Long-Range Reasoning” AI News has brought us a glimmer of hope. Researchers have proposed the innovative TIM (Thread Inference Model), which mimics how the human brain processes complex information. It cleverly breaks down a large problem into a “reasoning tree” and only retains the “subconscious clues” most relevant to the current step in its “working memory” 🤔💡. This smart mechanism enables the model to handle virtually infinite working memory and complex scenarios requiring multi-step tool calls, performing exceptionally well in math and information retrieval tasks that demand extensive long-range reasoning. It truly opens up a highly promising new path to definitively solve the “goldfish memory” problem of LLMs.
It’s not hard for AI to draw an image and “Photoshop” an object onto a person’s hand. But making that image look like the person is genuinely “holding,” “lifting,” or “using” the object, with that natural sense of interaction, has been incredibly difficult to achieve. However, a recent study titled “HOComp: Interaction-Aware Human-Object Composition” AI News proposes an incredibly clever solution. This method first leverages powerful Multimodal Large Language Models (MLLMs) to deeply understand the type of interaction between humans and objects, such as whether it’s a “tight grip” or a “gentle support.” Subsequently, it meticulously adjusts the human posture to achieve the most natural interaction effect, while employing various carefully designed loss functions to ensure high consistency in appearance between the added object and the background. This ultimately elevates the realism and credibility of composite images to a whole new level, marking a significant step towards truly lifelike AI content generation.
AI Industry Outlook and Social Impact
Tech giants, in their relentless pursuit of breakthroughs, have once again clashed fiercely with the boundaries of personal privacy. xAI, Elon Musk’s AI company, was recently exposed for massively collecting facial data from over 200 employees through an internal project called “Skippy” to train its core Grok model. The stated goal of this project is for AI to better understand and recognize complex human emotions. Although xAI claims all data collection was done with signed employee consent and promised for internal training only, the “permanent” access clause in the agreement still sparked widespread concern and unease among employees regarding privacy security and the potential abuse of portrait rights 😬. This incident not only led to the controversial virtual figures Ani and Rudi but also once again thrust the difficult balance between tech giants’ innovative impulse and ethical responsibility into the public spotlight. This AI News also reminds us that technological development requires more robust regulations to safeguard it.
The AI wave is sweeping through workplaces globally with unstoppable force, simultaneously giving rise to some hilariously novel forms of “performance art.” According to a recent survey by Howdy.com, about 16% of U.S. employees frankly admit they “pretend” to use AI at work, solely to meet their superiors’ expectations for technological innovation and project an image of being tech-savvy. Behind this phenomenon lies widespread AI anxiety permeating the workplace: over one-fifth of employees feel uneasy about using AI but are pressured to adopt a stance of “embracing” new tech. Even more interestingly, another survey reveals the flip side of the coin: nearly half of employees who actually use AI in their work choose to keep it a secret from their bosses, fearing they might be perceived as lazy or lacking in capability. This unfolding workplace “metamorphosis” profoundly reveals the significant chasm between the speed of technology adoption and employees’ skill sets and psychological adaptation.
Some bittersweet AI News has emerged: Amazon Web Services (AWS) has officially confirmed that its AI Research Institute in Shanghai has been disbanded. This was AWS’s last overseas research institute globally. Dr. Wang Minjie, the institute’s Chief Applied Scientist, expressed profound sentiment on social media, stating he was “lucky to have caught the golden era of foreign enterprise research institutes in China.” Amazon’s official response called it a “difficult decision,” aiming to streamline teams and optimize global resource allocation to concentrate investments more effectively in core innovation areas. However, this move has undoubtedly sparked widespread concern and intense discussion in the industry regarding whether foreign enterprises’ R&D strategies in China are fully contracting. It also seems to herald the quiet end of a golden era where foreign capital led China’s frontier technological exploration.
Top Open Source Projects
moby - AI News (⭐70.1k): Imagine moby as the ultimate “Lego” brick treasure trove for the containerized world! This collaborative project, initiated and led by Docker, provides a complete set of standardized core components, allowing you to freely assemble and customize complex container-based systems like building with blocks. It’s an indispensable cornerstone for building all modern cloud-native applications.
OpenBB - AI News (⭐44.7k): OpenBB is a professional-grade investment research terminal striving to be accessible to everyone. It cleverly integrates massive, complex financial data and professional analytical tools into a fully open-source platform. Its grand vision is to completely break down information barriers and truly democratize investment research.
hyperswitch - AI News (⭐22.3k): hyperswitch is an open-source payment “super switch” meticulously built with the high-performance language Rust. It aims to make enterprise payment processes faster, more reliable, and more affordable than ever before, helping merchants easily connect and intelligently manage multiple payment channels, completely saying goodbye to the hassle of being “tied down” by a single payment gateway.
jj - AI News (⭐17.9k): jj is a brave new-generation version control system that boldly claims to be simpler and more powerful than Git. It’s not only fully compatible with Git, allowing for seamless switching, but also offers a far more user-friendly experience and a suite of powerful new features than its predecessors. Maybe it’s the next “can’t live without it” tool for developers worldwide! 🔥
ConvertX - AI News (⭐5.9k): Think of ConvertX as your personal “universal factory” for file conversion. This fully self-hostable online file converter is powerful enough to support mutual conversion of over 1000 file formats, allowing you to easily transform any file format while ensuring absolute data privacy and security.
PakePlus - AI News (⭐4.8k): Witness a miracle! This amazing tool, PakePlus, can package any website or web project into an ultra-lightweight desktop and mobile application, smaller than 5MB, in just a few minutes. For developers looking to quickly achieve cross-platform product deployment, this is undoubtedly a highly efficient shortcut.
hrms - AI News (⭐3.1k): hrms is a feature-complete open-source Human Resources and Payroll Management System. It provides a comprehensive and powerful HR solution for small and medium-sized businesses, allowing them to fully control all core HR tasks, from detailed employee management to complex payroll distribution, greatly improving management efficiency.
Social Media Share
An experienced engineer recently shared her deep concerns on Jike - AI News: A intern in her team was relying entirely on LLMs to write code, which ultimately led to a project riddled with bugs, and the intern themselves couldn’t explain the core logic behind the code at all. She sharply pointed out that AI should be a powerful tool to aid human deep thinking, not a shortcut that allows skipping the fundamental learning process. Young engineers who rely too early on models and neglect a solid understanding of underlying logic are highly susceptible to falling into the ethereal “vibe coding” trap. This, she warned, “is genuinely dangerous” for long-term personal career growth.
User wwwgoubuli deeply reviewed ByteDance’s AI programming tool, Trae, on X - AI News. He believes that while Trae’s performance in the full-lifecycle closed-loop “solo mode” is merely “on par” with other competitors and hasn’t created a generational gap, its product interface design is “radical yet exceptionally reasonable.” This results in an overall experience that is unparalleled among similar domestic products. He couldn’t help but exclaim that ByteDance’s product prowess truly lives up to its reputation, being powerful enough to inspire awe.
A developer on the X platform showered praise on Lovart.ai - AI News, hailing it as the world’s first true “Design Agent,” far beyond a simple image-making tool. This AI can think independently and fully execute a series of complex design tasks, from brand logo design and building an entire brand visual system to video ad creation and 3D model production. This undoubtedly loudly proclaims that a new AI-driven design era has arrived.
User Li Jigang shared a profoundly poetic and philosophical Prompt on X - AI News, aimed at guiding AI to act as a “language alchemist” for meticulously naming new products. This Prompt deeply emphasizes that a good name is “a container capable of holding grand dreams,” and should pursue “a triple resonance between sound, form, and meaning.” Its lofty literary realm and profound intent make it a rare work of art in the field of Prompt engineering.
If you’re eager for your AI-generated images to possess astonishing visual texture, then the clever trick user Xiangyang Qiaomu shared on X - AI News is an absolute must-see. He generously shared a Prompt specifically for Claude that can consistently generate stunning, crystal-clear, light-and-shadow-intertwined 3D frosted glass card effects. Even better, he included a link to detailed instructions and impressive example images, practically holding your hand as you become an AI painting master.
After “Big Tech High-P” (high-level professionals), the next aspirational status symbol for countless individuals might just be “independent researcher.” User wwwgoubuli observed an interesting phenomenon on X - AI News: Many renowned GitHub project authors and academic bigwigs, after choosing to join top tech companies like ByteDance or OpenAI, seem to have their publicly published academic papers and active open-source contributions “vanish into thin air.” People can then only occasionally glimpse their latest research updates on these companies’ official blogs or executives’ tweets. This observation sparks profound reflection on the relationship between open innovation and internal corporate R&D.
In the AI era, how should one choose future professional paths? A college freshman, about to embark on their university journey, posted for help on Reddit - AI News. They’re grappling with a dilemma between two seemingly traditional majors: life sciences and agriculture. However, their concern isn’t about which major is currently hotter or offers easier employment, but rather which one can better synergize and co-develop with AI technology in the future, instead of being mercilessly replaced by it. This question reveals Gen Z’s deep thinking and foresight regarding future technological and societal changes, and this AI News item is truly worth pondering.
A developer on Reddit excitedly launched PHOAI, an AI photo editor - AI News! The coolest thing about this app is its ability to directly translate natural language instructions like “turn me into an anime character” into stunning visual effects. More crucially, all image processing runs efficiently on the user’s device locally, without needing cloud uploads. This not only ensures user privacy but also fully demonstrates the smooth experience and enormous potential brought by edge AI applications. 🤩
Want to systematically learn how to make LLMs “cite sources” and speak with substance when answering? Then this new course on Retrieval-Augmented Generation (RAG) - AI News is absolutely unmissable! 🎓 RAG technology significantly enhances the factual accuracy of large model responses by intelligently retrieving and injecting relevant information from external knowledge bases before the model generates its answer. It also effectively avoids the costly and time-consuming process of model retraining, making it a critical core technology for building production-grade AI applications today.
Listen to the Audio Version of AI Daily News
🎙️ Xiaoyuzhou | 📹 Douyin |
---|---|
Next Life Tavern | Self-Media Account |
![]() | ![]() |