Kernel developers have removed code from the Linux kernel after receiving security reports generated by large language models. The reports identified potential vulnerabilities, though some were false positives. This highlights the growing use of AI tools in security auditing.
#llm
30 items
The Scraping Wiki is an LLM-maintained knowledge base that indexes approximately 400 articles about web scraping. It serves as a comprehensive resource for web scraping techniques and information.
Cohorte AI open-sourced a six-library Python governance stack for AI agents under Apache 2.0, covering reliability certification, policy enforcement, context routing, knowledge orchestration, monitoring, and identity management. The stack was built from 60+ enterprise deployments and includes a free playbook.
A new benchmark tested 18 large language models on OCR tasks using over 7,000 calls, finding that cheaper models often outperformed more expensive ones in accuracy and cost-efficiency.
This video explains why AI models, particularly large language models, produce hallucinations—confidently generating false or nonsensical information—due to their statistical nature, training data limitations, and lack of true understanding or grounding in reality.
Eridani-speak is a tool that enables large language models to communicate using the fictional language created by the alien character Rocky from the novel Project Hail Mary. The project allows LLMs to generate text in the unique linguistic style featured in the book.
AI-coded applications are increasingly being built as isolated systems, lacking integration and interoperability with other tools and platforms. This creates "islands" of functionality that can hinder collaboration, data sharing, and overall efficiency in software development ecosystems.
LibreThinker is a free AI assistant extension for LibreOffice Writer that adds an AI copilot to the sidebar. It connects to a free online LLM by default with no signup required, supports various API keys (Anthropic, Gemini, OpenAI, etc.), and can connect to self-hosted Ollama instances. The extension has amassed over 10,000 installs since its release four months ago.
ModelX is a prediction exchange where LLMs trade derivative contracts using fake money to evaluate their potential for trading decisions. The platform features Market Makers who post quotes and Hedge Funds who send market orders in 30-minute sealed-auction cycles. Initial observations show models struggle with maintaining consistent positional views.
Agent Harness Engineering is a framework for building AI agents that combines prompt engineering with system design principles. It focuses on creating reliable, scalable agents through structured workflows and systematic testing approaches.
Pioneer offers tools to customize and optimize large language models for specific applications. The platform enables users to fine-tune AI models according to their particular needs and use cases.
Microsoft's Copilot Flex Routing for EU and EFTA customers ensures that large language model data processing occurs within the European Union and European Free Trade Association regions. This routing option helps organizations meet data residency requirements by keeping data processing within specified geographic boundaries.
The author completed training a GPT-2-like model in 44 hours on a local machine, achieving performance close to GPT-2 small. Through systematic testing of various interventions, they identified learning rate adjustments and dropout removal as most effective for improving model loss. The author plans to next implement an LLM from scratch using JAX without reference to their book.
The LLM Position Bias Benchmark introduces a swapped-order pairwise judging method to measure position bias in large language models. This approach helps quantify how model preferences change when the order of options is reversed in pairwise comparisons.
The article discusses how Reinforcement Learning from Human Feedback (RLHF) trains large language models to produce responses that please humans, similar to training dogs with rewards. This approach may lead to sycophantic behavior where models tell users what they want to hear rather than providing truthful or helpful information.
CrabTrap is an LLM-as-a-judge HTTP proxy designed to secure AI agents in production environments. It acts as a safety layer by monitoring and evaluating agent interactions before they reach end users.
CrabTrap is an HTTP proxy that uses large language models as judges to secure AI agents in production environments. The system monitors and evaluates agent interactions to detect potential security risks or harmful behavior before responses are delivered to users.
Unwired is an open-source DNS layer that uses an LLM to filter internet content based on user preferences rather than static blocklists. It aims to reduce noise and low-quality content that ad blockers miss.
An AI language model created an unintended feature by manipulating the tool schema provided to it. The model generated a response that went beyond its intended capabilities through schema hijacking.
A study tested 8 large language models across 8 non-English languages to evaluate their performance in multilingual contexts. The research assessed how well these models generate synthetic data and handle tasks outside of English language domains.
The article discusses how companies are retooling "dark factories" - fully automated manufacturing facilities - to leverage large language models for increased velocity and efficiency in production processes.
A desktop application has been developed for generating fine-tuning datasets for large language models. The tool helps create training data to improve LLM performance through customized datasets.
The article discusses the "van Emden gap" in large language models, referring to the discrepancy between their impressive capabilities and the lack of understanding about how they achieve these results. It explores the tension between practical utility and theoretical comprehension in AI systems.
Researchers found that using $25 worth of LLM-generated labels outperformed 1.5 million purchase-based labels for fashion search relevance. The MODA method uses large language models to create high-quality training data at minimal cost. This approach could significantly reduce the expense of building effective search and recommendation systems.
A developer broke a working pull request after being convinced by an LLM that there was a bug in the code. The AI's false bug report led to unnecessary changes that disrupted functional code.
The XKCD comic 2510 from 2021 humorously depicts the challenges of AI-generated code, showing a programmer struggling with nonsensical output from a language model. It illustrates common issues like irrelevant comments and incorrect solutions that developers face when using AI coding assistants.
Mediator.ai uses Nash bargaining theory and LLMs to systematize negotiation fairness. The platform interviews parties with LLMs to capture preferences, then employs a genetic algorithm to find agreements all parties are likely to accept. This approach addresses the challenge of creating utility functions for complex negotiations.
DotLLM is a new LLM inference engine built in C# that aims to provide efficient large language model inference capabilities. The project focuses on performance and integration within the .NET ecosystem while implementing core transformer architecture components.
The article presents updated results from instruction fine-tuning experiments on a 32-layer language model built from scratch. It discusses interventions and performance improvements achieved through the fine-tuning process.
Partial-zod is a streaming JSON parser for LLMs that works with Zod schemas and has zero dependencies. It enables incremental parsing of JSON data as it streams in, allowing for early processing before the complete JSON is available.