Cross-Architecture Function Matching

Woodland, Samantha L.

Student Work

Cross-Architecture Function Matching

Public

Malware almost always comes without the source code; therefore, it is necessary for reverse engineers to examine malware binaries at the assembly level. Malware authors frequently re-use pieces of code, therefore new malware can often overlap with other already analyzed binaries. Function matching assists in binary diffing in that it reduces the manual labor required to examine each file and assists in identifying the differences between the two. If functions that have been previously analyzed can be identified in new malware, those results are helpful in increasing the speed and accuracy of an analyst’s reverse engineering. However, function matching becomes more difficult when matching across different architectures. Although the binaries may have been compiled from the same source code, the machine instructions cannot be compared directly, due to different instruction sets, calling conventions, register sets, etc. among different architectures. Our matcher, the PCodeMatcher, uses PCode, an intermediate representation language developed by the NSA for use in Ghidra, their reverse engineering framework. PCode is used by Ghidra to decompile machine instructions into C code independent of architecture. Using a fuzzy string-matching technique, we were able to match 70% of functions in our cross-architecture sample set. We believe that, with further development, function matching using PCode could be a valuable tool for analyzing binaries in reverse engineering.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator