Blog | SparkFabrik

AI for developers: the open source software revolution

Written by SparkFabrik Team | Apr 17, 2025 12:18:04 PM

Generative AI and its accessibility through open source tools are transforming the development landscape. In this article, we explore the benefits and AI tools for developers, analyzing how they are fundamentally revolutionizing the software development process and redefining IT industry processes.

Open source and the spread of AI tools for developers 

Open source is playing a key role in the evolution of Generative Artificial Intelligence technologies, once again demonstrating its ability to drive innovation in emerging technology areas. A de facto standard for modern software solutions and cloud technologies, open technologies are now used by 90% of organizations worldwide. A trend that is also extending to the AI sector, which saw explosive growth in 2023-2024 and continues to expand rapidly.    

However, applying open source principles to generative AI raises unprecedented new challenges, such as adapting the definition of open source software to large language models (LLMs), which involve massive datasets, dedicated hardware and code. 

Despite these complexities, the open source ecosystem for GenAI is rapidly evolving in all areas of development; from runtime platforms and inference engines, to vector databases and development frameworks. Fact that shows how this approach is enabling communities and companies to collaborate to unlock the full potential of technologies that might otherwise be controlled by only a few players. 

How do developers feel about this ongoing revolution? The sentiment of the development world, the perception of the current AI-dev scene, appears to be positive, with 72% of developers stating a favorable or very favorable position regarding the use of AI tools in the development workflow.  

So, let's see in more detail what areas of work are affected by new AI tools for developers, and in particular the open source tools analyzed by Paolo Mainardi, CTO of SparkFabrik, for Linux Foundation Newsroom.

What is Open Source AI?

Open Source AI represents an approach to Artificial Intelligence in which models, tools and datasets are publicly accessible, and developers and companies can contribute, modify and deploy technologies without having to depend on proprietary solutions.. This model follows open source principles already established in the traditional software industry, with the goal of making AI more transparent, collaborative, and accessible to a broad community of innovators.

Mozilla recently proposed a more structured definition of Open Source AI, emphasizing that “to be truly open source, an AI system must provide access and editing capabilities to all of its key components: code, data, and models”. Mozilla, Open Source AI Definition】 This point is crucial because many AI solutions, while claiming to be open, limit access to training datasets or model weights, creating a kind of “ half open source.”

McKinsey also highlights this problem, stating that “open source AI offers greater transparency and customization, but poses crucial questions about governance, security, and scalability”.【McKinsey, Open Source in the Age of AI】 The adoption of open AI models therefore requires a balance between innovation and risk management, especially considering emerging ethical and regulatory implications, such as the European AI Act.

A key contribution to the definition of open source AI also comes from the Open Source Initiative (OSI), which is working on a clear and structured definition. According to their most recent draft, "in order for an AI system to be considered truly open source, four fundamental freedoms must be guaranteed: freedom to use the system for any purpose, freedom to study and understand how the model works, freedom to modify it, and freedom to share it”.【OSI, Open Source AI Definition Draft】 However, OSI stresses that these freedoms must apply not only to the code, but also to the training data and model weights, elements often excluded by traditional open licenses.

The importance of open source AI lies not only in the democratization of the technology, but also in the key role it can play in fostering greater trust in artificial intelligence tools. Indeed, the ability to analyze, modify, and improve the code makes it possible to avoid situations in which a few players entirely control the AI landscape, with significant impacts on the security and competitiveness of the industry.

Runtime platforms and open source models

First of all, let’s start with open source platforms and large language models (LLMs), which are becoming true mainstays in the field of Generative Artificial Intelligence. Similar to solutions such as OpenAI and DeepSeek, these platforms require several components working together to function effectively. In this context, the choice of inference engine is critical, as it must be compatible with the hardware used, such as CPU or GPU, to avoid slowdowns. 

Once the engine is chosen, it is important to know how to interact with it using programming tools. However, these tools can often be complex for less experienced developers. Here is where the potential of AI for developers comes into play, through more user-friendly platforms that can handle the technical complexity for the user. 

Among the most interesting open source solutions are Ollama, LocalAI, GPT4ALL, and Jan. They offer similar functionality but cater to different needs: from the expert, to the end user looking for an alternative to mainstream, commercial platforms.

Database vettoriali

Vector databases are also emerging as major elements in the ecosystem of Gen AI applications, particularly to implement advanced techniques such as Retrieval-Augmented Generation (RAG).  

RAG has become critical to overcome the limitations of LLMs trained on static public datasets. RAG allows LLMs to be linked to external data sources, significantly improving the accuracy and timeliness of the responses generated.  

The RAG process consists of three main phases: data preparation, retrieval, and generation. The data preparation phase is particularly critical and involves gathering information from a variety of sources, converting it to plain text, splitting it into meaningful chunks, and saving it to a database along with its vector representation.  

This vector representation, obtained through the process of embedding using specialized AI models, allows mathematical operations to be performed to search for similar documents based on the distance between vector points. Vector databases are specifically designed to handle these operations, from embedding to similarity searching, making it possible to efficiently implement RAG applications.

Roberto Peruzzo, in his talk “AI and Typesense: how to integrate semantic search into Drupal” provides a practical example of the use of vector databases in development. In his talk, he illustrates Typesense, an open-source search engine that improves the search experience on a site, demonstrating how AI and semantic search can revolutionize the way users find content on Drupal sites. 

Other vector database projects to note are: Chroma, Weaviate, pgvector, Milvus and Qdrant.

Development Frameworks

Development frameworks are acting as real accelerators for innovation in the AI field. This is because the development of GenAI applications inevitably requires the integration of multiple components, from data management to interaction with different types of AI models, each with its own APIs, weights and configurations.  

It is this complexity that has led to the creation of specialized frameworks that simplify and accelerate the development process. Just like the invention of Docker and Kubernetes revolutionized the software development landscape over the past decade, GenAI frameworks are rapidly redefining the technology industry.  

At this juncture, we can name complete frameworks such as Langchain and Llama Index. Other notable projects include Microsoft Semantic Kernel and Autogen, which focus on specific aspects of AI development. Frameworks such as Haystack and Vercel AI, on the other hand, offer more targeted solutions for specific use cases. 

IDEs and development assistants

Integrated Development Environments (IDEs) and development assistants based on Generative Artificial Intelligence (GenAI) are redefining the programming landscape, promising to revolutionize the way developers write, understand and maintain code.  

The concept of an automated coding assistant dates back to the 1970s, but the advent of large language models has taken this technology to a whole new level. Tools such as GitHub Copilot have paved the way for a new generation of AI assistants, spurring the development of open source models specifically trained for coding assistance. In addition to improving code quality, one of the most significant benefits of these tools is increased code safety through code suggestions that reduce common errors. 

Unlike traditional IntelliSense systems, which offer deterministic suggestions based on static analysis, these new AI assistants leverage the ability of GenAI models to produce context-based text, generating entire blocks of relevant code. Although, this capability brings with it the risk of “hallucinations.”

An outstanding example of a leading IDE is certainly Claude and its newest tool, Claude Code, which allows programmers to delegate complex tasks directly from the terminal, improving programming efficiency and accuracy. 

Other open source projects that stand out are Sourcegraph's Cody and Continue. These tools offer advanced features such as context-based code completion, explanation of existing code, and automatic test and documentation generation. 

AI for developers: successful use cases

The AI revolution for developers is not just theory; at SparkFabrik we have implemented two concrete use cases that are transforming our development processes, improving the efficiency and quality of our work. 

SparkBot: contextual intelligence at the fingertips of teams 

SparkBot represents our implementation of a Retrieval-Augmented Generation (RAG) platform integrated directly into our Gitlab ecosystem. Developed by the Platform Team in the R&D context, this AI assistant has two main goals: to become familiar with RAG technology and to spread its knowledge within the company. 

The bot is distinguished by its ability to access and contextualize organization-specific information, answering questions about projects, service configurations, and team organization. For example, a developer may ask, “What projects does the Alpha team manage?” or “Who are the internal and external owners of project X?” receiving accurate answers based on up-to-date data. 

SparkBot's architecture is based on Python, Langchain and ChromaDB, with direct integration with Slack to facilitate interaction. The system draws on several internal data sources, including: 

  • Our enterprise documentation portal
  • The CI/CD component catalog 
  • Reference books such as "Cloud Native Transformation" and "Team Topologies"

This implementation, while still in the experimental stage, demonstrates how RAG technology can be used to create an accessible and contextual "enterprise intelligence" that enhances the organization's knowledge assets.

PR Agent: automated code review to improve code quality 

The second use case implemented in SparkFabrik is the adoption of the Codium PR Agent, an AI assistant that intervenes directly in Gitlab pipelines to improve the code review process. This tool automatically monitors new merge requests and provides instant feedback, radically transforming the approach to code review. 

When an MR is opened, the PR Agent automatically generates a description based on the title and content, analyzes the code, and suggests specific improvements. Developers can also interact with the bot through specific commands such as /review to request a full review, /improve to get suggestions for improvements, or simply /ask to ask questions about the code. 

The PR Agent implementation is based on a containerized architecture with an extended Docker image with a custom entrypoint and an Nginx container as a reverse proxy. The system supports AI models from OpenAI and Anthropic, with the ability to select the preferred model via environment variables.

This tool has proven particularly valuable for: 

  • Identifying potential bugs and vulnerabilities before they reach the production environment
  • Ensuring adherence to company code standards 
  • Speed up the review process, allowing human reviewers to focus on more strategic aspects
  • Provide an ongoing learning opportunity for developers through contextual suggestions

Impact and results

Implementing these AI tools has led to tangible benefits at SparkFabrik: 

  1. Reduced onboarding time: new developers can quickly familiarize themselves with company projects and standards thanks to SparkBot.
  2. Increased code quality: the PR Agent has helped standardize best practices and reduce potential bugs in production. 
  3. Skill growth: both tools act as virtual mentors, exposing developers to optimal patterns and techniques.
  4. Operational efficiency: speeding up code reviews and reducing time spent searching for information optimized the overall workflow

These use cases demonstrate how developer AI is more than just a tool for completing basic tasks, and instead a true multiplier of productivity and quality, capable of transforming development processes and enhancing the human and knowledge assets of the company. 

Challenges and best practices for the ethical use of AI in development 

In conclusion, the adoption of Generative Artificial Intelligence (GenAI) in software development is rapidly transforming the field, offering many benefits and introducing as many new challenges. One among them is the training of large language models, which requires expensive hardware. For example, Meta stated that training Llama 3.1 required more than 16,000 Nvidia H100 GPUs, with costs of up to $640 million. Cost that greatly limits competition and suggests the need to develop far more affordable training techniques. 

Moreover, ethical and legal issues, such as the new European AI Act, require ongoing attention, as the definition of open source in AI is still evolving through the work of the OSI Foundation. 

Despite these challenges, the landscape is promising. Successful use cases, such as those implemented at SparkFabrik, demonstrate how developer AI can transform development processes and enhance the company's human and knowledge capital.

Open source AI tools not only assist in code completion, but also help simplify complex workflows, making the entire development process smoother and more manageable. As many as 81% of developers acknowledge an increase in productivity as a result of AI tools, although only 43% trust the results, with 31% remaining skeptical.  

There is still a long way to go and we are sure that there will be no lack of surprises. One certainty remains (today at least): we are fortunate to live in a time when open source is the predominant development model, giving everyone the opportunity to participate in this revolution.