Penetration Testing: Generated Code Attacks

Firesand: End-to-End Security for your Business

Introduction

This is the fourth article in the series, and it is discussing how the concepts in some of our previous articles, including some of our primer articles, are connected. Many security vulnerabilities lie in the point between two systems. For example, where one system generates code or commands for another system to execute and, in particular, if the generating system uses external input to generate such code. The external input doesn't necessarily have to be direct user input, it could be from a configuration file, or data (e.g., cookie values, HTTP headers, etc.) that can be manipulated by an attacker.

Audience

Who is this article for?

Understanding application layer security threats is important for a wide-range of professions, including:

Security Architects and Solution Architects, as they need to understand the potential risks and mitigations to take appropriate design decisions, such as access control.
Software Engineers / Developers, because they need to understand how to build their applications in a secure manner and avoid critical mistakes.
Security Consultants of various types as they need to understand the risks of what they may be reviewing or consulting on.
Penetration Testers who need to understand how to attack these systems!

Pre-Reading

As reference for this article, it is worth reading the following articles that we have already published:

What is Generated Code?

Firstly, to understand generated code attacks, we need to understand what generated code is. In simple terms, generated code is code created by one application for another to execute.

When building applications, in particular where an application (be it mobile app, desktop application, web application, or an API) is built containing multiple technologies that interact - and specifically where one component (c₁) talks to another component (c₂) by passing it some form of programming code which is to be interpreted and then executed by c₂ - then you have generated code.

In this scenario c₁ will have generated, in one way or another, code. This is extremely common, there are several common scenarios:

An application of any kind, that generates SQL code to be interpreted and executed by a database.
An application, that generates No-SQL code to be interpreted and executed by a No-SQL database.
An application that generates shell script (e.g., bash commands or batch commands) to be interpreted and executed by an Operating System (OS).
An application that generates X-Path or XML to be interpreted and parsed by an XML parser.
An application that generates templates to be interpreted by a server or client side templating engine.
An application that generates LDAP queries to be interpreted by a domain controller or similar (e.g. Active Directory).
An application (typically a web server), that generates HTML and/or JavaScript which will be interpreted and rendered by a browser.
Even an application that generates an image that is then processed by some image rendering system.

And there are many, many other options, especially these days with increasing use of JSON and Infrastructure-as-Code (IaC).

In all of these scenarios, the receiving component, c₂, trusts that the data supplied by c₁ is non-malicious and generally - due to the generic nature of the components - it would be impractical for c₂ to make a distinction between malicious and non-malicious data.

The following diagram highlights the basic scenario:

So, What are Generated Code Attacks?

Generated Code Attacks are where an attacker exploits the trust between the two or more systems (i.e. between c₁ and c₂ as above). The receiving, for example often a database, would receive a wide-range of commands from the sender (c₁). It is practically difficult - or impossible - for c₁ to determine a malicious DROP or DELETE command from a legitimate one.

So, all those aforementioned generated code scenarios (and many more) can be mapped to well known (generated code) attacks/vulnerabilities, as follows:

SQL Injection.
No-SQL Injection.
Command Injection.
X-Path Injection and Xml eXternal Entities (XXE).
Server Side Template Injection (SSTI) or Client Side Template Injection (CSTI).
Light-weight Directory Access Protocol (LDAP) Injection.
Cross-Site Scripting (XSS).
This one is interesting, it would perhaps most likely be a buffer overrun (also known as buffer overflow) or an integer overflow attack - though, depending on the image format it could be something entirely different.

Of course a buffer overrun in general also follows a similar pattern, but at a lower-level in the technology stack. A component is sending data to another component for interpretation and processing.

The receiving component trusts the data it is sent - which is the crux of the problem. Practically, in most scenarios, because the receiving component is often a generalised system and an attacker would necessarily send recognised commands, it is difficult for the receiving component to know the difference between a legitimate request and a malicious one.

The following diagram expands upon the previous, highlighting how there is an additional condition needed for an attack to succeed - externally supplied input (i₁) that is used in the generation of the generated code:

Diagram showing the context for a Generated Code Attack

The key point now is that i₁ is, in most cases, benign. However, in the case of an attack, this is malicious input.

The following presents two examples, based on the above sample scenarios, one discussing SQL Injection and the other discussing XSS.

SQL Injection

In this case, c₁ could be a web application, c₂would be a database (e.g. MySQL, Oracle, SQL Server etc). Suppose c₁ is generating a SQL statement along the lines of:

SELECT product_name, product_description FROM TProduct WHERE product_name = '<user_supplied_input>'

In order to return product details in some kind of retail / eCommerce web application and where <user_supplied_input> is the input i₁.

If i₁ is malicious, a number of attacks are possible. The attacker can inject a UNION SELECT to exfiltrate data from the database. In reality, an attacker could exfiltrate all information from the database if they are able to inject arbitrary SQL in a scenario such as this. Additionally, they could potentially launch destructive attacks using DROP and DELETE commands. If the database server supports something like xp_cmdshell (a database feature that allows the execution of shell commands) it would be possible to interact with the underlying OS that the database server is running on, and almost certainly be able to gain a remote shell (i.e., remote access to the database server).

XSS

In this case, c₁ could be a web application, c₂be the user's browser (e.g. Chrome, Firefox, Edge etc). Suppose c₁ is a C#.NET web application (ASP.NET) generating HTML content using the code below:

Response.Write("<p> Your search for " + Request.QueryString["q"] + " returned the following results: </p>");

If an attacker were to specify i₁ (injecting into the q query string parameter) along the lines of:

<script>alert(1);</script>

This would cause the C#.NET code to generate the HTML content of:

<p> Your search for <script>alert(1);</script> returned the following results: </p>

This would be received by the browser and executed. Of course, this particular example is fairly benign (if slightly annoying, by introducing a pop-up alert box).

Addressing Generated Code Attacks and Why Care

At Firesand, we are often asked for advice with questions such as: "How to stop SQL Injection", "How do I stop XSS", "How do I prevent XXE" and so on. Our advice is to not focus on these specific attacks, as they are all instances of a wider class of problem: Generated Code Attack. Thus, if you defend really well against one instance, do you know if you have defended against any other instances? Whereas, if you resolve the Generated Code Attack problem, you fix not only the one you are concerned about, but also any others that you have not yet considered.

As most, if not all, generated code attacks rely on being able to supply input into a system that is not expected, the primary defence against Generated Code attacks in is, therefore, input validation. Always ensure that input is in the expected form, before you accept it and process it. For example, in the aforementioned SQL Injection and XSS scenarios, the supplied input would be expected to be in the form of lower and upper case alphanumeric characters (possibly with white space) - the exact valid input set is case-specific, of course! If the input is anything other than that expected input, it must be rejected. In doing so, most generated code attacks will fail.

Another line of defence is to sanitise (via encoding) output data when dealing with data being sent to external systems (e.g., use URL Encoding and/or HTML Entity Encoding when a web application sends data back to a browser for rendering).

Conclusion

This series has underscored the critical intersections where security vulnerabilities emerge in the architecture of modern software systems, particularly through the lens of generated code. As we have explored, the vulnerabilities primarily stem from the trust placed in automated processes that generate and execute code across disparate systems. This trust, while practically necessary, opens up avenues for attack through SQL injections, XSS attacks, and more, as detailed through the examples provided.

The fundamental challenge is to ensure that all code generated by one system and consumed by another is thoroughly scrutinised and sanitised. Security architects, developers, and testers must incorporate robust validation mechanisms at every stage of code generation and execution to safeguard against malicious inputs that can lead to catastrophic breaches. As technology continues to evolve at a rapid pace, the complexity of these interactions will only increase. The emergence of new paradigms such as IaC and the proliferation of APIs across micro-services architectures amplify the need for stringent security protocols.

In essence, many well-known security issues arise from this core concept: a system automatically generates code that another system trusts and executes. If the generating system fails to properly validate its input, this trust can be exploited, allowing attacks to propagate to the final receiving system.

Therefore, to ensure that an application or system does not fall foul of a wide-range of security vulnerabilities, they must validate inputs into any form of code generation.

30 Apr 2024

About the author

https://www.linkedin.com/in/chrisblake/

Chris Blake has over 20 years of experience in the information and cyber security field, and is a passionate and qualified Enterprise Security Architect and Privacy Professional who leads and delivers innovative solutions at Firesand Limited, a company he co-founded in 2016. His specialities include application security, enterprise security architecture, and privacy, with a strong track record of building and implementing ISO 27001 compliant and certified information security practices, application security programmes, and enterprise security architectures. He has a thirst for continual learning and a commitment to excellence, as demonstrated by his academic and professional credentials from prestigious institutions such as the University of Oxford, (ISC)², IAPP, SABSA, The Open Group, and ISACA.

Chris holds an MSc in Software and Systems Security at the University of Oxford, and an array of professional certifications: CISSP, ISSAP, CSSLP, CCSP, SSP.NET, SSP.JAVA, CISA, CISM, CIPP/E, CIPM, CIPT, FIP, SCF, TOGAF, CPSA, and CEH.

Chris' experience spans multiple sectors: Retail & eCommerce; Financial Services, Banking, & Payments; i-Gaming; Energy (Oil & Gas); Property Management & PropTech; and Data Science; as well as Defence.

His areas of interests include: penetration testing; regulation & privacy, including the impact on society; access control in software; security automation in development; application of cryptography; security architecture; risk modelling & analysis; HTTP architecture & web security; IoT Security.