AI in Software Testing. The LLM way.
AI is transforming the traditional software testing into more efficient, accurate and intelligent process. Advanced LLM models backed by machine learning algorithms can autonomously generate unit test cases, stimulate user interaction, run test cases and analyze the output to identify security vulnerabilities and issues to improve the code without human interaction.
The LLM way
After the introduction of LLM models, they’re widely used for testing/generating codes. The most common way for testing the code is to provide code to an LLM and ask it to review/refactor the code. This is the traditional way. The new/more efficient approach is to create agents with different roles who has a backstory, goal and access to tools to test the code.
LLM Model
The most crucial part of software testing using LLM is the selection of an LLM model. There are tens of thousands of models out there which is paid and publicly available. The paid models include Gemini and ChatGPT which provides good results but there is a risk of code being sent outside the development premise which would be major concern for many companies. The solution to this is to use an open source model like IBM Granite, Mistral, Llama3 or Mixture of Expert models like llam2-hermes-orrca-platypus-wizardlm to back the agent.
What is an Agent?
An agent is an autonomous unit programmed to:
- Perform tasks
- Make decisions
- Communicate with other agents
Think of an agent as a member of a development team, with specific skills and a particular job to do. Agents can have different roles like ‘Manager’, ‘Developer’, or ‘Tester’, each contributing to the overall goal of the crew.
Agent Attributes
Attribute | Description |
Role | Defines the agents function within the crew. It determines the kind of tasks the agent is best suited for |
Goal | The individual objective that the agent aims to achieve. It guides the agent decision making process |
Backstory | Provides context to the agents role and what they’re good at |
LLM | The large language model used by the agent. Defaults to using open source models like Mistral MoE / IBM Granite |
Tools | Set of capabilities or functions that the agent can use to perform tasks . Tools can be shared or exclusive to specific agents. It is an attribute that is set during the initialization of an agent |
The first step of creating an autonomous software testing system is the creation of agents. We can create different agents who will talk to each other to solve a particular task. For software testing we will be creating agents with roles like ‘Test case generator’, ‘Test Executor’, ‘Result Analyzer’, ‘Developer’ and ‘Manager’.
Test Case Generator
This agent is responsible for developing unit test cases based on the code. We can provide a backstory of his capabilities in creating good unit test cases which can cover all the aspects of testing.
Test Executor
This agent will run the test cases. He will have access to tools that can execute the software just like humans and can run test cases without human intervention.
Result Analyzer
This agent will analyze the results of execution generated by Test executor and passes it to Manager agent.
Manager
He will convert the analyzed results into executable report and solutions for the problems. His backstory will be like a pro at coding for years and never fails to find optimum solutions to the problems he come across. He may have access to internet tools for surfing solutions and summarizing the results for arriving at solutions.
Developer
Manager agent will pass the detailed report along with the solutions to the developer agent which could be backend developer or frontend developer depending on the task and this agent will refactor the code based on the fixing and will pass the new code to the Test case generator agent.Once the software is tested thoroughly with multiple iterations the manager will decide whether the defects reported have been corrected or not. There will also be a default iteration cutoff value upon which the execution stops from being on infinite loop.
Benefits
- Improved accuracy: AI testing tools can learn what is normal and then spot anything that doesn’t fit with high accuracy compared to humans
- Adaptability: AI systems have the capacity to adapt and evolve over time, continuously learning from past testing experiences and incorporating new insights into their testing strategies through reinforcement learning.
- Ease of use: AI tools can be integrated into the development workflow, which provides feedback directly within code repositories or development environments.
- Time Effective: Manual code review is a time-consuming process but AI based code review system can analyze code much faster and provide instant feedback with high accuracy.
- Cost Effective: Traditional code review can be expensive where as an AI based code review system can provide instant feedback at a fraction of the cost of a manual review.
Real world examples and use cases
- InApp AICoder : This is an internal tool developed by InApp Information technologies for code refactoring, unit test case generation and checking security vulnerabilities of code
- Facebook’s Sapienz: Facebook uses an AI-powered tool called Sapienz for automated software testing at scale.
- Google’s DeepMind for Game Testing: Google uses its AI technology, DeepMind, for testing games.
- IBM Watson: IBM Watson uses AI to automate software testing processes, enabling faster and more accurate testing.
- Testsigma: Testsigma is a cloud-based continuous testing tool that uses NLP for test case creation and an AI-powered core for maintenance of all automated test cases.
- Mabl: Mabl employs AI for autonomous testing, creating and executing end-to-end tests that evolve alongside the application.
- Appvance.ai: Appvance.ai offers AI-driven testing solutions that generate thousands of test variations and data combinations to test applications thoroughly.
Challenges
Data Quality issue
LLM based testing systems rely on the quality of the code dataset used for training the model. Non standard code, bugs in code, untested codes can led to prediction issues and unreliable test results. Also the data has to cover all the scenarios and use cases. Maintaining and using high quality data is a difficult process.
Model context length
Most of the open source models have a context length of upto 32k tokens. This means we can only analyze upto 1600 lines of code at a time and even this comes with challenges like Lost in the Middle phenomenon. The best results got from an open source model was from Mistral which has a context length of 32k but performs well for codes below 600 lines. Other models like IBM Granite and llama2-hermes-orca-platypus-wizard lm showed promising results but only for codes below 500 and 200 lines respectively
Algorithm Biases
Machine learning biases refer to producing biased results that reflects the human biases of a society including social, economical biases and social inequality. Bias are mostly found in the initial training data.
Code complexity
This related to the complexity of the codes generated by AI. Sometimes the code generated by AI is much more complex and there will be simple solutions available and this leads to challenges in trust and adaptation among testing teams.
Future Scope
The integration of AI technologies in software testing is revolutionizing software development practices. AI testing is expected to become the new standard in the next few years and forecasts the AI-enabled testing market to skyrocket from $736.8 million to $2.7 billion by 2030. Upon releasing of new models the testing capabilities of agents are improving and this will assist human testers to improve productivity and ease of work. Fingers crossed 🤞