AI, DevOps & Platform Engineering: new frontiers for development

All posts

Written by
SparkFabrik Team

DevOps AI

Book My Discovery Call

AI, Devops, Platform Engineering. Abstract neon circuit background - Featured Image SparkFabrik

Artificial intelligence (AI) has taken an increasingly central role in the DevOps landscape, especially in recent years, radically transforming the way development and operations teams collaborate to create, test, and deploy software.

According to research conducted by GitLab on over 5,000 IT professionals, the adoption of machine learning and AI technologies is constantly growing: 51% of respondents use AI for checking procedures, 37% for software testing, and 31% for code review, with percentages increasing compared to the previous year.

The main technologies driving this evolution include machine learning, which allows systems to learn from data and improve over time; natural language processing (NLP), which enables computers to understand and generate human language; artificial vision, which allows for the interpretation of visual data; and the integration of virtual assistants and chatbots, which facilitate interaction between users and systems.

These technologies are revolutionizing the software lifecycle, automating complex processes, improving operational efficiency and developer productivity. AI complements the skills and toolset available, as we analyzed in the article AI for Developers.

Beyond the integration of AI in DevOps, a new frontier is emerging: AI applied to Platform Engineering. This natural evolution of DevOps, or rather a concrete and targeted implementation of it, focuses on creating and managing scalable and automated development platforms, leveraging artificial intelligence to optimize infrastructure and improve the developer experience. AI + Platform Engineering thus represents the next step in the evolution of DevOps practices, promising to bring further innovations and efficiencies to the world of software development.

Why AI in DevOps? Benefits & Advantages

More and more organizations are looking at AI as a value accelerator for producing quality software. In a context where speed in bringing new products or features to market is increasingly important, automating DevOps chains represents a concrete response to the challenges of complexity and scalability that companies face. Infusing AI into DevOps processes allows organizations to increase productivity, reduce error margins, and strengthen the entire technological ecosystem.

One of the most evident aspects of this transformation is the speed with which development and deployment phases can be completed. According to GitLab's survey, over 60% of DevOps teams have recorded a reduction in release times thanks to the use of AI-driven tools for code review, testing, and automatic deployment. An emblematic case is that of Bosch, which introduced machine learning technologies in its DevOps pipelines, achieving a 30% acceleration in software release times for automotive embedded systems, without sacrificing the quality of the final product.

AI, however, is not just a matter of speed, but also of consistency. Artificial intelligence, in fact, also proves useful in planning activities and managing consumption resources (typical of Cloud providers) supporting organizations also in activities related to the world of Cloud Management and FinOps.

By standardizing complex and time-consuming activities such as code reviews, test validation, or technical documentation generation, it is possible to significantly reduce human errors. This is evident in the case of Microsoft, which has integrated GitHub Copilot into its development cycle: according to published data, over 40% of the code written by internal teams has been co-generated by the AI assistant, contributing to greater consistency and quality.

Beyond DevOps: AI and Platform Engineering

Platform Engineering is a very advantageous practice for organizations seeking to industrialize software development lifecycle processes. If DevOps builds bridges between Dev and Ops in terms of internal culture and practices, Platform Engineering provides the concrete infrastructure and internal products to put it into practice and make it sustainable and scalable, particularly in complex business contexts. Obviously, innovations in the AI world also enrich this approach, further increasing the benefits of platforms that are now intelligent.

With AI integrated at the platform level, repetitive and time-consuming tasks such as resource provisioning, environment configuration, or performance monitoring are entrusted to intelligent agents, capable of dynamically adapting to the context. The infrastructure thus becomes increasingly autonomous, able to self-regulate and learn over time from the behaviors of the applications and teams that use it. In complex environments such as multi-cloud or hybrid ones, this capability translates into a reduction in configuration times and a drastic reduction in setup errors.

But perhaps the most significant benefit concerns the developer experience. Intelligent platforms allow for reducing the "cognitive load" related to the management of the development environment, allowing developers to focus exclusively on code. Tools like Humanitec, Backstage, or internal solutions from companies like Spotify and Zalando demonstrate how a well-designed developer platform, enhanced by AI, can improve productivity, reduce the number of operational tickets, and increase the sense of ownership within teams.

However, this new frontier is not without pitfalls. The reliability of artificial intelligence models, especially unsupervised ones, represents a critical point: a system that learns autonomously could generate unexpected behaviors if not properly monitored. Furthermore, delegating key processes to AI models introduces new attack surfaces and vulnerabilities, forcing platform and security engineering teams to rethink software security mechanisms, in terms of observability, audit, and control.

Custom AI Development. We develop tailored AI solutions for your business and integrate them into your systems.

Application of artificial intelligence in DevOps and Platform Engineering

AI has led to the birth of numerous use cases and experiments, made precisely to try to validate those benefits that in most cases are still on paper. The birth of Copilots has helped speed up this process, but more courageous organizations have also tried to go further, with use cases like these:

Code suggestion

AI-based coding assistants, such as GitHub Copilot and Claude Code, are becoming indispensable tools for developers. These tools provide real-time suggestions, improving code quality and accelerating the development process. For example, JPMorgan Chase reported an increase in software engineer efficiency of up to 20% thanks to the adoption of an AI-based coding assistant.

Automated testing

AI is transforming software testing, allowing for test automation and proactive error identification. According to GitLab, integrating AI into the testing process allows for detecting anomalies in log data and other sources, helping DevOps teams identify potential problems before they become critical.

CI/CD

AI improves continuous integration and deployment (CI/CD) processes by automating data analysis and optimizing workflows. Tools such as those offered by Dynatrace use AI to analyze observability data, providing predictive insights and suggestions for improving CI/CD pipelines.

Monitoring and security

AI enables continuous and proactive monitoring of infrastructure through agents, improving system security and resilience. For example, Dynatrace uses a proprietary AI engine, Davis, to provide predictive analytics and automatically identify the root causes of problems, improving security and operational efficiency.

Applications of AI in Platform Engineering

In the context of Platform Engineering, AI automates tasks such as code change management, software testing, complex integrations, and security issue management. Additionally, AI facilitates the analysis of centralized platform data, such as CI/CD logs, deployment histories, configurations, and system metrics, generating accurate insights with little noise. For example, it can quickly analyze predictable log data and alerts to produce focused summaries, such as highlighting failures from the last 24 hours.

DevOps AI tools: useful tools to improve productivity

With the entry of AI into the DevOps landscape, market tools and new open source solutions are gradually evolving to be AI-Powered or to vertically integrate toolchains (as in the case of Platform Engineering for the AI/ML world). Here are some examples:

GitHub Copilot: AI assistant for programming that uses OpenAI's GPT model to suggest code in real-time, improving productivity and learning.
Datadog: monitoring and analysis platform that uses machine learning algorithms to detect anomalies, analyze root causes, and predict problems.
PagerDuty: incident management platform, which uses AI to optimize response to critical events, analyzing patterns to prevent future problems.
KitOps: centralized version control for all AI/ML project resources.
Kubeflow: simplified management of ML workflow in Kubernetes.
DVC (Data Version Control): ensures reproducibility by tracking data sets, code, and experiments.
Prometheus: real-time monitoring of ML infrastructure and implementation.
MLflow: manages the lifecycle, including model detection, deployment, and version control.

The future of AI in DevOps and Platform Engineering

It is evident how, in the DevOps and Platform Engineering context, AI currently expresses only a small part of the expected potential – as also emphasized by Patrick Debois in his talk dedicated to AI Platform Engineering. The constant push of Big Tech in search of the next competitive advantage, as well as organizations' investments in this direction, certainly represent positive signals. However, it is important to keep in mind a series of challenges, such as:

The need to maintain human supervision. No matter how sophisticated the algorithms are, human intervention remains essential to ensure the proper functioning of systems. AI does not replace judgment, but amplifies it: it allows operating more quickly, with greater precision, but still needs to be guided.
Data quality and security. AI models are as effective as the data that feeds them. If this data is incomplete, distorted, or unrepresentative, the result will be equally unreliable.
Secure and local prototyping. To facilitate the adoption of AI, but especially to increase the real understanding of such technologies, it is crucial to provide business teams with prototyping environments and local development environments (LDE). Without such conditions, it is difficult to move from technical enthusiasm to practical implementation, experimenting and prototyping with AI quickly and safely.
Transparency and responsibility. In a technological ecosystem where AI makes autonomous or semi-autonomous decisions, it becomes essential to know how and why those decisions were made. If on one hand AI increases output and intelligently automates, on the other hand, we must avoid falling into the classic "ironies of automation," maintaining surveillance, control, understanding, and responsibility for the work of these tools. This implies the traceability of outputs generated by the models, the possibility of auditing certain automated processes, and the presence of shared guidelines on the correct use of AI.