Opaque Fields Access
As mentioned in the talk Automation Techniques in C++ Reverse Engineering by R. Rolles, reverse engineers spend a non-negligible amount of time to identify structures and their attributes:
And I completely agree!
To better understand how structures are involved in reverse engineering, let’s consider the following code which involves a JNI function:
class SecretString {
public:
SecretString(const char* input) : value_(input) {}
bool check() {
checked_ = (value_ == "OMVLL");
return checked_;
}
private:
bool checked_ = false;
std::string value_;
};
bool check_jni_password(JNIEnv* env, jstring passwd) {
const char* pass = env->GetStringUTFChars(passwd, nullptr);
SecretString secret(pass);
return secret.check();
}
When this code is compiled, env->GetStringUTFChars()
is called through:
- An access to the
GetStringUTFChars
pointer in theJNIEnv
structure. - A call on the dereferenced pointer.
In assembly it looks like this:
ldr x8, [x8, #1352] ; 1352 is the offset of GetStringUTFChars
blr x8 ; in the JNIEnv structure
When decompiling the check_jni_password
function, we can effectively observe this offset, and
most of the disassemblers can also resolve the structure’s attribute, once the user has resolved and provided
its type:
Similarly, once we have identified and reversed the layout of the SecretString* this
pointer,
the SecretString::check
function is a bit more meaningful:
std::string
structure is still a bit confusing since the beginning of the previous code is related
to the optimization performed by the STL for small strings.On the other hand, when using this pass on the structures JNIEnv
and SecretString
, the output of the decompilation
is confusing even if, we manually define the type of the registers associated with JNIEnv
and
SecretString
.
The following figures show the differences in BinaryNinja and the output of IDA is very close:
When to use it?
You should trigger this pass on structures that aim at containing sensitive information. It might be
also worth enabling this pass on the JNIEnv
structure for JNI functions involves in sensitive computations.
How to use it?
You can trigger this pass by defining the method obfuscate_struct_access
in the configuration class file:
def obfuscate_struct_access(self, _: omvll.Module, __: omvll.Function,
struct: omvll.Struct):
if struct.name.endswith("JNINativeInterface"):
return True
if struct.name == "class.SecretString":
return True
return False
In the current version, O-MVLL expects a boolean value but futures versions should also be able to accept an option on the access type (read or write). For instance:
if struct.name == "class.SecretString":
return omvll.StructAccessOpt(read=True, write=False)
Implementation
This pass works with a first stage which consists in identifying the LLVM instructions: llvm::LoadInst
and
llvm::StoreInst
.
Then, there is a processing of the operands for these instructions, to check if they are used to access the content of a structure or an element of a global variable. In such a case, it resolves the name of the structure or the name of the global variable and calls the user-defined callback to determine whether the access should be obfuscated.
Upon positive feedback from the user’s callback, O-MVLL transforms the access from this:
ldr x0, [x1, #offset];
Into that:
$var := #offset + 0;
ldr x0, [x1, $var];
Without any additional layer of protection, $var := #offset + 0;
can be folded by the compiler
which would result in the original instruction.
To prevent this simplification, the instruction #offset + 0
is annotated1 to automatically
apply Opaque Constants and
Arithmetic Obfuscation on this instructions:
IRBuilder<NoFolder> IRB(&Load);
Value* opaqueOffset =
IRB.CreateAdd(ConstantInt::get(IRB.getInt32Ty(), 0),
ConstantInt::get(IRB.getInt32Ty(), ComputedOffset));
if (auto* OpAdd = dyn_cast<Instruction>(opaqueOffset)) {
addMetadata(*OpAdd, {MetaObf(OPAQUE_CST), MetaObf(OPAQUE_OP, 2llu)});
}
Limitations
This pass would not resist against the Dynamic Structure Reconstruction technique presented by R. Rolles in the presentation mentioned in the introduction.
Nevertheless, it would require to use an AArch64 DBI which does not exist yet2.
References
Publications
Ghidra 101: Creating Structures in Ghidra
by Craig Young
Automation Techniques in C++ Reverse Engineering
by Rolf Rolles
Automated Reverse Engineering of Relationships Between Data Structures in C++ Binaries
by Nick Collisson for NCC Group
Tools
dynStruct
by Daniel Mercier
See the section Annotations for the details. ↩︎
I personally worked on this support in Quarkslab’s QBDI but since I left the company this support is owned by Quarkslab. It might be published by Quarkslab though. ↩︎