# Arithmetic Obfuscation

## When to use it?

Arithmetic operations (`+, -, ^, &, |, ...`

) are usually compiled into an AArch64 instruction that
**exactly** matches the original operation^{1}.

For instance, when we compile this function in which the `xor`

operation is considered sensitive,

```
void encode(uint8_t* data, size_t len) {
for (size_t i = 0; i < len; ++i) {
data[i] = data[i] ^ 0x23;
}
}
```

It results in this sequence of AArch64 instructions:

```
mov w10, #35
...
ldrb w11, [x9]
eor w11, w11, w10
strb w11, [x9], #1
```

From a reverse engineering perspective, the sensitive xor operation associated with the `EOR`

instruction is
straightforward to identify.

Thus, if you consider that the arithmetic operations performed by your function are sensitive, you should consider enabling this obfuscation pass.

As a result, when this obfuscation pass is enabled, the previous operation is transformed into a more complex expression:

```
// Transformation of data[i] = data[i] ^ 0x23;
uint32_t C = data[i];
int32_t X = -(C & 0x23);
data[i] = ((C + (0xffffffdc | !C) + 0x24) ^ X) +
(((C + (0xffffffdc | !C) + 0x24) & X) << 1);
```

## How to use it?

In the O-MVLL configuration file, you must define the following method in the Python class implementing `omvll.ObfuscationConfig`

```
def obfuscate_arithmetic(self, mod: omvll.Module,
fun: omvll.Function) -> omvll.ArithmeticOpt:
if func.name == "encode":
# Explicitly define the number of iterations
# applied on the arithmetic expressions
return omvll.ArithmeticOpt(iterations=8)
if mod.name.endswith("encrypt.cpp"):
# Obfuscate with a number of iteration defined by O-MVLL
return True
return False
```

The `iteration`

parameter defines the number of loops transforming the arithmetic operation:

```
Operation : X ^ Y
#1 Loop : (X | Y) - (X & Y)
#2 Loop : (((Y + X) - (Y & X)) ^ ((Y | X) - (Y + X))) + 2*(((Y + X) - (Y & X)) & ((Y | X) - (Y + X)))
```

It is more than **highly recommended** to run this pass with at least **2 iterations**.

See the next section for the details

## Implementation

The transformation of the arithmetic operations works by iterating over all the instructions
of a basic block and selecting arithmetic operations (`+, -, ^, &, |`

) that can be transformed with
a Mixed Boolean-Arithmetic expression:

```
for (Instruction& I : BasicBlock) {
auto* BinOp = dyn_cast<BinaryOperator>(&I);
if (isSupported(*BinOp)) {
// Process ...
}
}
```

In the first stage, the processing of the operation consists in extracting and wrapping the arithmetic operation (e.g. `add`

)
with a function.

For instance, let’s consider this addition:

```
int b = a + b;
```

After creating its wrapper, it becomes:

```
int __omvll_add(int x, int y) {
return x + y;
}
int b = __omvll_add(x, y);
```

*“Why doing such transformation and not directly replacing the addition with its MBA equivalent?”*

First, LLVM is able to simplify MBA as it is discussed in the next section: Limitation & Attacks.
One solution to prevent these optimizations is to create a function flagged with the attribute `OptimizeNone`

,
which is roughly equivalent to:

```
__attribute__((optnone))
int __omvll_add(int x, int y) {
return x + y;
}
```

The second advantage of this call-transformation lies in the implementation of the iteration process. If we perform the transformations on the arithmetic operations on the fly in the original basic block, for each iteration, we must re-process all the other instructions of the basic block.

With a wrapper, it only requires applying the iterations on the isolated function, not on all the instructions of the original basic block:

```
__attribute__((optnone))
int __omvll_add(int x, int y) {
E = x + y;
for (size_t i = 0; i < NB_ITERATIONS; ++i) {
E = MBA_add_xor_sub_or_and(E);
}
}
```

Last but not least, we have to make sure that the wrapped function is inlined at the end of the processing otherwise, it would be a weakness.

```
Wrapper->addFnAttr(Attribute::AlwaysInline);
```

This LLVM inlining code is equivalent to:

```
__attribute__((optnone))
__attribute__((alwaysinline))
int __omvll_add(int x, int y) {
E = x + y;
for (size_t i = 0; i < NB_ITERATIONS; ++i) {
E = MBA_add_xor_sub_or_and(E);
}
}
```

Regarding the transformations themselves, they rely on the built-in LLVM pattern matcher:

```
Instruction& I = ...;
Value *X, *Y;
// Match X & Y
if (!match(&I, m_And(m_Value(X), m_Value(Y)))) {
return nullptr;
}
```

Upon a match , the instruction associated with the operation is replaced with a Mixed Boolean-Arithmetic expression (MBA):

```
// (X + Y) - (X | Y)
return BinaryOperator::CreateSub(
Builder.CreateAdd(X, Y),
Builder.CreateOr(X, Y)
);
```

The overall structure of the *transformer* is highly inspired by the InstCombine pass
and the MBA used in this pass are taken from sspam
developed by Ninon Eyrolles:

Operation | MBA Transformation |
---|---|

`X ^ Y` | `(X | Y) - (X & Y)` |

`X + Y` | `(X & Y) + (X | Y)` |

`X - Y` | `(X ^ -Y) + 2*(X & -Y)` |

`X & Y` | `(X + Y) - (X | Y)` |

`X | Y` | `X + Y + 1 + (~X | ~Y)` |

## Limitations & Attacks

First and foremost, LLVM is able to *simplify* MBA as we can see in the file
`llvm/lib/Transforms/InstCombineAndOrXor.cpp`

```
...
// (A ^ B) ^ (A | C) --> (~A & C) ^ B -- There are 4 commuted variants.
if (match(&I, m_c_Xor(m_OneUse(m_Xor(m_Value(A), m_Value(B))),
m_OneUse(m_c_Or(m_Deferred(A), m_Value(C))))))
return BinaryOperator::CreateXor(
Builder.CreateAnd(Builder.CreateNot(A), C), B);
// (A ^ B) ^ (B | C) --> (~B & C) ^ A -- There are 4 commuted variants.
if (match(&I, m_c_Xor(m_OneUse(m_Xor(m_Value(A), m_Value(B))),
m_OneUse(m_c_Or(m_Deferred(B), m_Value(C))))))
return BinaryOperator::CreateXor(
Builder.CreateAnd(Builder.CreateNot(B), C), A);
// (A & B) ^ (A ^ B) -> (A | B)
if (match(Op0, m_And(m_Value(A), m_Value(B))) &&
match(Op1, m_c_Xor(m_Specific(A), m_Specific(B))))
return BinaryOperator::CreateOr(A, B);
// (A ^ B) ^ (A & B) -> (A | B)
if (match(Op0, m_Xor(m_Value(A), m_Value(B))) &&
match(Op1, m_c_And(m_Specific(A), m_Specific(B))))
return BinaryOperator::CreateOr(A, B);
...
```

Hence, we must be careful to not trigger these simplifications (c.f. Implementation)

Second, the MBA currently used in this pass are:

- Fixed and statically encoded.
- Simplified by the LLVM optimization pipeline if misconfigured.
- Can be simplified with existing public tools once extracted.

For the third point, msynth or Triton could be used to verify if the expressions are simplified.

Nevertheless, **automatically** identifying **AND** extracting MBAs from a compiled binary is not trivial
so we can consider that this pass introduces a non-negligible overhead for the reverse engineers.

For these reasons and, in its current form, this pass should be considered as a *cosmetic protection* rather
than a state-of-the-art protection.

## References

###### Publications

###### Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions

by *Benjamin Reichenwallner, Peter Meerwald-Stadler*

###### Improving MBA Deobfuscation using Equality Saturation

by *Matteo Favaro, Tim Blazytko*

###### QSynth - A Program Synthesis based Approach for Binary Code Deobfuscation

by *Robin David, Luigi Coniglio, Mariano Ceccato*

###### Tools

###### msynth

by *Tim Blazytko*

###### qsynthesis

by *Robin David*

###### Talks

###### Advanced Code Obfuscation With MBA Expressions

by *Arnau Gàmez Montolio*

###### Attacks

###### Oracle Synthesis Meets Equality Saturation

*PoC associated with the blog post: 'Improving MBA Deobfuscation using Equality Saturation'.*

The compiler might also optimize the operation into instructions that are equivalent but less meaningful from a reverse engineering perspective. For instance,

`x % 8`

is usually transformed in`x & 7`

. Hacker’s Delight is a good reference for this kind of transformation. ↩︎