4. Tutorials¶
Each one of these tutorials are for the Intel platform.
4.1. Disassembling the current address¶
This is an introductory tutorial which will show an example of a multicased function as well as exposing the user to aliases.
- Most multicased functions have a variation that takes no
parameters in order to imply the current address. Another
way to do this, however, is to use
database.here()
which is aliased asdatabase.h
. This function is called so often, however, that its been imported into the default namespace of this plugin. So we can literally just callh()
.
> print hex(h())
- To disassemble this instruction, we’ll rely on the
database.disassemble()
function that is aliased asdatabase.disasm
.
> print db.disasm(h())
- As mentioned before, most multicased functions have a no-parameter variation which refers to the current address. Therefore another way of accomplishing this can be the following.
> print db.disasm()
4.2. Setting breakpoints (WinDbg) on all labels in a function¶
This is an easy tutorial that will discuss tagging. Tags will be used to filtering functions for specific instruction types, identifying operands, and eventually generating breakpoints that can be imported into WinDbg.
- We’ll start with by navigating to a function in IDA. Let’s just store its address for now.
> f = func.address()
- We’ll need to identify an address that has a label. We can use
database.type.has_label()
for that. Let’s navigate to a label and make sure it works (of course it works).
> print db.t.has_label()
True
>
- Now we can iterate through all of the function’s chunks while looking for a label. Instead of using a list comprehension, (which are incomprehensible to most), let’s just use a straight up for-loop.
> result = []
> for ea in func.iterate(f):
if db.t.has_label(ea):
result.append(ea)
continue
>
- This is easy enough, but there’s a better way using tagging. By
using tagging, we can keep navigating to functions that we want to
collect labels in and then aggregate them for later. To grab the
label, it’s simply a name that we can grab with
database.name()
. So let’s tag up each label with the key “labels_to_get”.
> for ea in func.iterate(f):
if db.t.has_label(ea):
db.tag(ea, 'labels_to_get', db.name(ea))
continue
>
- So now, we can do this to any function we want and you’ll notice that each address with a label now includes a tag as its comment. Now we can output something to paste into WinDbg (or actually write to a file that we can then use $$< to execute).
> for res in db.selectcontents('labels_to_get'):
for ea, res in func.select(*res):
print r'bp %x ".printf \"Hit label %s\n\""'%( ea, res['labels_to_get'] )
continue
>
- So now we’ve outputted a list of breakpoints to feed into our
debugger. This will only work if the base address of our database
matches our image base in our debugger. But…we can actually just
feed an offset to our debugger instead. This will allow our
breakpoint to be independent of our base address. To get our module
name, we can use
database.module()
, and to convert our address to an offset, we can usedatabase.offset()
.
> for res in db.selectcontents('labels_to_get'):
for ea, res in func.select(*res):
print r'bp %s+%x ".printf \"Hit label %s\n\""'%( db.module(), db.offset(ea), res['labels_to_get'])
continue
>
- And now we have some breakpoints that output the label they execute.
4.3. Tagging all dynamic calls in the database¶
Similar to above, we will use tags to mark all the dynamic calls in the database. This is a medium difficulty tutorial that will also touch on tag importing an exporting.
When we’re done, we’ll also remove the tags we’ve created to avoid cluttering things up. Let’s pretend we’re looking at Delphi and we want to identify all functions that allocate something and tag all of the dynamic calls within them.
First we’ll need to enumerate all the functions that we care
about. We can do that via database.functions.list()
and then use database.functions.iterate()
to select
a subset of them, or we can just iterate through everything in the
database via database.functions()
.
- To start out, let’s assume that we have most of the System package already named. So, let’s look for functions within that package that do stuff related to memory.
> db.functions.list('System.*Memory*')
...
>
- We turned up some results, so let’s assume that we like them. Now
we can use
database.functions.iterate()
to iterate through our results and then usefunction.tag()
to tag them for later.
> for ea in db.functions.iterate('System.*Memory*'):
func.tag(ea, 'is-memory-function', 1)
>
- Let’s expand our search a little bit by also tagging the callers of
these functions. This can be done by using
function.up()
. Our tag name “is-memory-function” doesn’t make sense, so we’ll tag the callers with “calls-memory-function”.
> for ea, res in db.select('is-memory-function'):
for ea in func.up(ea):
func.tag(ea, 'calls-memory-function', 1)
continue
>
- Now that we have all of our functions tagged with “is-memory-function”,
or “calls-memory-function. These can both be queried
database.select()
to select them. Since we’re searching for either tag (or), we’ll use theOr
parameter to return any function that has either tag assigned. We plan on iterating through these results, so we’ll need to usefunction.chunks.iterate()
(or really its aliasfunction.iterate
) to look for our instruction type. To test for an indirect call instruction (a call which branches to a register or a phrase), we can simply use theinstruction.is_calli()
function.
> for ea, res in db.select(Or=('is-memory-function', 'calls-memory-function')):
for ea in func.iterate(ea):
if ins.is_calli(ea):
db.tag(ea, 'indirect-call', 1)
continue
continue
>
- Just to keep our function comments clean, let’s untag both the function tags that we applied. Since we tagged the contents of these functions with the tag “indirect-call”, querying for this contents tag will end up giving us the subset of the results we care about.
> for ea, res in db.select(Or=('is-memory-function', 'calls-memory-function')):
for tagname, value in res.iteritems():
oldvalue = func.tag(ea, tagname, None)
print "Removing tag %s from function %x: %s"% (tagname, ea, func.name(ea))
continue
>
- After cleaning up, now we should have the actual dynamic call
instructions tagged in the contents of our functions. So to continue,
let’s tag the operand type for each instruction. This way we can
determine which registers the instructions’ operands are composed
of. We can do this using
instruction.op_type()
which is aliased asinstruction.opt
. Actually, in order to check our results, let’s actually store all of the operand types using its plural form,instruction.ops_type
. As usual, this has an abbreviated alias ofinstruction.opts()
. We’ll also keep things clean again, by removing the previous tag, “indirect-call”.
> for ea, res in db.selectcontents('indirect-call'):
for ea, res in func.select(ea, *res):
print "Tagging address %x with %d operands"% (ea, ins.ops_count(ea))
db.tag(ea, 'call-optypes', ins.opts(ea))
print "Removing old \"%s\" tag from %x"% ('indirect-call', ea)
db.tag(ea, 'indirect-call', None)
continue
>
- Just to sanity check things, lets prove that all of the calls that we
care about really only have one operand. To do this, we’ll output their
address using the
database.disassemble()
function which is aliased asdatabase.disasm
and also tag them so we can refer to them later. We’ll do this removal by passing theNone
parameter todatabase.tag()
.
> for res in db.selectcontents('call-optypes'):
for ea, res in func.select(*res):
if len(res['call-optypes']) != 1:
print "Unknown operand count %d for instruction: %s"% (len(res['call-optypes']), db.disasm(ea))
db.tag(ea, 'calli-unknown', 1)
print "Removing old tag \"%s\" from %x"% ('call-optypes', ea)
db.tag(ea, 'calli-optypes', None)
continue
continue
>
- Now if we want, we can manually go through all of the “calli-unknown”
contents tags and figure out what is odd about them. But, we’re
really only interested in the registers for the first operand. To
decode the first operand, we can use
instruction.op_value()
which is aliased asinstruction.op()
. Now operands that are composed of registers (or symbols) inherit from thesymbol_t
type. This type has asymbols
property which will allow one to enumerate the symbols (really registers) belonging to an operand. So, let’s go ahead and identify our “call-optypes” instructions again, and create a new tag, “call-opregs”. This new tag will contain all of the registers we need to resolve the target address of the branch instructions that we’ve selected.
> for res in db.selectcontents('call-optypes'):
for ea, res in func.select(*res):
op = ins.op(ea, 0)
regnames = []
for symbol in op.symbols:
regnames.append(symbol.name)
print "Tagging %x with %s containing the regs %r"% (ea, 'call-opregs', regnames)
db.tag(ea, 'call-opregs', regnames)
continue
>
- Tagging these registers for each call instruction is actually going
to be useful to pass along to a debugger. With this we know which
register to dump for a call instruction in order to calculate its
target. Instead of calculating them though, let’s remain hacky and
just output their results as a breakpoint. In the prior tutorial,
we chose
database.offset()
in order to calculate the relative address. Instead of doing it that way, there’s a class in thetools
module that we can use to transform an address namedtools.remote
. So let’s use this instead. To construct this class, we’ll need to pass our remote base address as a parameter.
> R = tools.remote(remote_base_address)
> print hex( R.get(h()) )
- Now that we have an instance of
tools.remote
, we can select our instructions tagged with “call-opregs” and produce a breakpoint for each one. Let’s do that.
> for res in db.selectcontents('call-opregs'):
for ea, res in func.select(*res):
emit_registers = ''
for regname in res['call-opregs']:
emit_registers += "r @%s;"% regname
# "put" our address into the debugger
remote_ea = R.put(ea)
print r'bp %x ".printf \"Hit call %s\n\";%s;g"'% (remote_ea, db.disasm(ea), emit_registers)
continue
>
- And now we’ve just outputted some breakpoints that we can feed into WinDbg which will emit the values of any registers that are required to branch via a call instruction. Let’s redo this because we might want to save these breakpoints for later. We’ll take the breakpoint that we generated for each instruction, and then store is via the tag “break-calli”.
> for res in db.selectcontents('call-opregs'):
for ea, res in func.select(*res):
emit_registers = ''
for regname in res['call-opregs']:
emit_registers += "r @%s;"% regname
bpstr = r'.printf "Hit call %s\n";%s;gc'% (db.disasm(ea), emit_registers)
db.tag(ea, 'break-calli', bpstr)
continue
>
- Now that we have the breakpoints stored, the next time we open this
database we should be able to generate the breakpoints for WinDbg
at time that we need them. This data can also be shared with other
users so that they will also have the access to the same information.
Just for fun, let’s serialize this data so that we can transport this
to another user. Rather than writing the queries to do this manually,
we can utilise one of the functions provided by the
tools.tags
module. Namely thetools.tags.export
. We only want to give them the “break-calli” tags which can be exported via the following code.
> data = tools.tags.export('break-calli')
>
> import pickle, os.path
> filename = os.path.join(db.path(), 'breakpoints.pickle')
> with file(filename, 'wb') as output:
pickle.dump(data, output)
>
> print "Dumped breakpoints to %s"% filename
- Again, if a user wants to import these pickled tags into their database,
then can simply unpickle the object and then use the functionality
available within the
tools.tags
module to do this.
> filename = os.path.join(db.path(), 'breakpoints.pickle')
> with file(filename, 'rb') as input:
data = pickle.load(input)
>
> tools.tags.apply(data)
- Unfortunately, this will overwrite any tags in the current database with
the name “break-calli”. If the user wants to map these tags to a different
name, however, they can provide a dictionary of tag mappings as a keyword
parameter to
tools.tags.apply
.
> tools.tags.apply(data, **{'break-calli': 'username.break-calli'})
>
4.4. Marking all functions that are “leaves”¶
This tutorial is somewhat “advanced”. Other than using tags as described in the prior tutorials, this will also discuss ways to use the combinators provided by this plugin.
- Knowing whether a function is a utility function that doesn’t call anything
might reduce the time it takes a reverser to determine the complexity of a
function. This plugin makes it pretty easy to do this thanks to the help
of functions like
function.down()
or the combination offunction.chunks.iterate()
andinstruction.is_call()
. So, let’s use these tools to define a function that returns whether a function calls other functions or not.
> def has_children(ea):
if len(func.down()) > 0:
return True
return False
>
> print has_children(h())
13
>
- One issue with using
function.down()
is since it only returns addresses that a function is capable of calling, it will still returnFalse
if the function we apply it to makes an indirect call. Let’s improve this by looking for any call via the following variation.
> def has_children(ea):
res = []
for ea in func.iterate(ea):
if ins.is_call(ea):
res.append(ea)
continue
return True if len(res) > 0 else False
>
- Another way to do this is via the combination of an anonymous function (lambda) and a list comprehension. This would look like the following code.
> has_children = lambda ea: True if len([ea for ea in func.iterate(ea) if ins.is_call(ea)]) > 0 else False
- Yet another way involves using the functional combinator component of
this plugin (see Functional combinators). To assist with these types
of one-liners, this plugin includes a number of combinators that can
be combined to build the exact same function. If we combine the
fpartial()
,ifilter()
, and some operators available via Python’soperator
module with thefcompose()
combinator we can implement our prior 2 implementations of thehas_children()
function with the following code.
> print "first we need to iterate through all addresses in function"
> func_iterator = func.iterate
>
> print "now we'll filter for all call instructions"
> func_callFilter = fcompose(func_iterator, fpartial(ifilter, ins.is_call))
>
> print "now we'll convert our ifilter into a list so we can count them"
> func_callLister = fcompose(func_callFilter, list)
>
> print "convert our list of call instructions into a count"
> func_callCounter = fcompose(func_callLister, len)
>
> print "now we want to return true if operator.lt(0, len(list( call_instructions )))"
> func_callComparison = fcompose(func_callCounter, fpartial(operator.lt, 0))
>
> print "this will now return true is the number of call instructions is > 0"
> has_children = func_callComparison
>
> print 'combined we have'
> has_children = fcompose(func.iterate, fpartial(ifilter, ins.is_call), list, len, fpartial(operator.lt, 0))
- The combination of these primitives can provide some potentially very
powerful tools if a user chooses to use this method. Nonetheless, it
is up to the user and their own personal preference. This function that
we’ve created,
has_children()
, will now be used to tag all of the functions that have no children. To start out, however, let’s create another function that will tag a function with the tag “function-type” and the value “leaf” ifhas_children()
returnsFalse
.
> def tag_if_leaf(ea):
if has_children(ea):
func.tag(ea, 'function-type', 'leaf')
return
>
- In the prior tutorials we used the
database.functions
namespace to enumerate each function. In this case we’ll use another useful function in that is provided to us by thetools
module. This functions istools.map()
and takes a callable as its first parameter. Normally, this callable will be passed an address for each function within the database. This callable will then be executed against every function similar to usingdatabase.functions()
. One thing that is interesting abouttools.map()
, however, is that it has the ability to detect the type of callable that is passed to it. If the callable takes two parameters, it will assume that the user intended an index, and an address to be passed to it. This can be used to detect how far alongtools.map()
has processed. Let’s redefine the abovetag_if_leaf()
function again.
> def tag_if_leaf(index, ea, **kwargs):
total = kwargs['total']
print "Percentage complete: %f"% (index / float(total))
if not has_children(ea):
func.tag(ea, 'function-type', 'leaf')
return (ea, False)
return (ea, True)
>
- Now that we have a callable to pass to
tools.map()
, we can simply hand it our callable and proceses the entire database.
> total = len(db.functions())
> res = tools.map(tag_if_leaf, total=len(db.functions()))
- Now, not only are all the “leaf” functions tagged, the variable
res
contains a list of tuples containing each function’s address, and whether or not it has any children. Let’s convert this to a Pythondict
and tag any functions that contain only indirect calls. This way we can distinguish if any of these functions are wrappers that use virtual methods.
> children_lookup = dict(res)
> for ea in children_lookup:
if children_lookup[ea] and len(func.down(ea)) == 0:
func.tag(ea, 'function-type', 'virtual-wrapper')
continue
>
- Now each function containing a call is tagged with “function-type” being equivalent to “leaf” or “virtual-wrapper” depending on whether no functions are called, or only indirect calls are made.
4.5. Using the user-interface (ui) module to map hotkeys to a function¶
This plugin wraps a number of the capabilities that are exposed by
IDAPython. One of these capabilities are to be able to interact with
IDA’s user-interface. This is exposed by the plugin via the ui
module. This tutorial introduces some of the user-interface wrappers
that are available and can be used to develop quick tools that the user
needs as they’re reversing a target.
- When doing iterative changes to the different content within a
database, it is common to perform actions such as apply a structure,
convert a list of data items into an array, convert a range of bytes
into a string, and so forth. As an example, when reversing C++ targets
where runtime-type-information is available (RTTI), information about
a class such as its size, number of virtual methods, etc. are available,
but not packed into a structure by IDA. Most of this is done relative
to whatever object the user is currently working with. The current
object the user is working with can be identifed with the
ui.current
namespace. This can be combined with other functionality within the plugin in order to automate a large aspect of the reverser’s chores.
> help(ui.current)
Help on class current in module ui:
class current(__builtin__.object)
| This namespace contains tools for fetching information about the
| current selection state. This can be used to get the state of
| thigns that are currently selected such as the address, function,
| segment, clipboard, widget, or even the current window in use.
|
| Class methods defined here:
|
| address(cls) from __builtin__.type
| Return the current address.
|
| color(cls) from __builtin__.type
| Return the color of the current item.
|
| function(cls) from __builtin__.type
| Return the current function.
|
| opnum(cls) from __builtin__.type
| Return the currently selected operand number.
|
| segment(cls) from __builtin__.type
| Return the current segment.
|
| selected = selection(*arguments, **keywords) from __builtin__.type
| Alias for `ui.current.selection`.
|
| selection(cls) from __builtin__.type
| Return the current address range of whatever is selected
|
| status(cls) from __builtin__.type
| Return the IDA status.
|
| symbol(cls) from __builtin__.type
| Return the current highlighted symbol name.
|
| widget(cls) from __builtin__.type
| Return the current widget that the mouse is hovering over.
|
| window(cls) from __builtin__.type
| Return the current window that is being used.
- When an RTTI entry is identified by IDA, it looks like the following unstructured data. With a cross-reference existing at address 0x723324. This data isn’t presently of use to a reverser, and it would be much easier to use if it was structured. Normally the reverser would need to navigate to the “Structures” view, create the structure, and then use ‘Alt+Q’ to apply the structure to the address at 0x723308.
.rdata:00723308 BC 34 72 00 off_723308 dd offset str.CNxPrefDigSigDlg
.rdata:0072330C 68 db 68h ; h
.rdata:0072330D 01 db 1
.rdata:0072330E 00 db 0
.rdata:0072330F 00 db 0
.rdata:00723310 FF db 0FFh ; ÿ
.rdata:00723311 FF db 0FFh ; ÿ
.rdata:00723312 00 db 0
.rdata:00723313 00 db 0
.rdata:00723314 00 db 0
.rdata:00723315 00 db 0
.rdata:00723316 00 db 0
.rdata:00723317 00 db 0
.rdata:00723318 E0 55 42 00 dd offset sub_4255E0
.rdata:0072331C 00 db 0
.rdata:0072331D 00 db 0
.rdata:0072331E 00 db 0
.rdata:0072331F 00 db 0
.rdata:00723320 00 db 0
.rdata:00723321 00 db 0
.rdata:00723322 00 db 0
.rdata:00723323 00 db 0
.rdata:00723324 C8 7C 7A 00 dd offset const CNxPrefDigSigDlg::`RTTI Complete Object Locator'
- To deal with this, we will first create the structure at the IDAPython
command prompt based on what we know. Depending on the C++ implementation,
this structure can be named CRuntimeClass, begins with a pointer to
a string, contains a pointer to the constructor, and ends with a reference
to the object locator. We will use
struc.new
to create it and add some members.
> st = struc.new('CRuntimeClass')
> st.members.add('p_name_0', int)
> st.members.add('v_size_4', int)
> st.members.add('v_8', int)
> st.members.add('pf_constructor_c', int)
> st.members.add('p_parentClass_10', int)
> st.members.add('p_14', int)
> st.members.add('p_18', int)
> st.members.add('p_runtimeTypeInformation_1c', int)
- Now that we have a structure stored in the st variable, we can use
db.set.structure
to apply it to an address. After applying the structure, we can then name it withdb.name
, or even tag it using thedb.tag
function in order to search for it later. Asdb.set.structure
returns the structure that was created, we will use this combined withdb.get.string
to create the new name for the address.
> def make_rtti(ea):
res = db.set.structure(ea, st)
name_address = res['p_name_0'].value
name = db.get.string(name_address)
# database.name(ea, 'gv', 'rtti', name, db.offset(ea))
database.name(ea, 'rtti' + '_' + name)
return res
- Now we can use this function, whilst passing the target address as a
parameter, to automatically apply the structure to the given address.
One issue with this, however, is that typically when dealing with a
CRuntimeClass, the cross-reference that navigates you to the class
will point to the p_runtimeTypeInformation_1c field which will
require you to seek by -0x1c bytes in order to apply the structure.
We can remedy this by assuming IDA will label the string, and use
the
db.a.prevlabel
function to get an address pointing to the previously defined label.
> def make_rtti_from_reference(ea):
name_address = db.a.prevlabel(ea)
return make_rtti(name_address)
- Now that we have a way to apply the structure to the address
identified by the previously labeled name, we can simply iterate
through all the cross-references pointing to the “RTTI Object Locator”,
and apply this function to it. However, what if we’re unsure that
every single cross-reference has this particular format? If that’s
turns out to be the case, then it might be better to do this on
command instead of programmatically. We’ll first define a function
that uses the current address via either the
ui.current.address()
function, or theh
function.
> def apply_rtti_here():
ea = h()
res = make_rtti_from_reference(ea)
constructor = res['pf_constructor_c'].value
db.tag(ea, 'rtti.constructor', constructor)
- Now we have a function that when executed, will use the current address,
and apply our structure to that given address. One issue with this is
the need to type the function into IDAPython in order to execute it. In
order to work around this, we can use the
ui.keyboard.map()
function to map it to a hotkey. We will use the ‘Ctrl+P’ hotkey.
> ui.keyboard.map('Ctrl-P', apply_rtti_here)
<capsule object "$valid$" at 0x7fe7812b51e0>
- At this point, we can simply navigate to a cross-reference pointing in
front of a label that we want to convert to a structure, and then hit
the ‘Ctrl+P’ hotkey to execute our function. This allows us to automate
some aspect of our reversing in order to annotate the database belonging
to the target. When we’re done, we can simply use the
ui.keyboard.unmap()
function to unmap our hotkey to restore IDA’s original functionality.
> ui.keyboard.unmap('Ctrl-P')
- This plugin includes a number of ways to interact with IDA’s user-interface.
Please review the help for the
ui
module for some of its other capabilities.
4.6. Conclusion¶
There are a variety of different features available in this plugin that
can allow users to automate different aspects of their reverse-engineering
project. It is recommended by the author to explore the different modules
by using Python’s help()
function to see what is available.
This plugin was written with the intention of enabling a reverse-engineer to automate many issues that one may encounter while reversing without investing in too much development effort. The author hopes that these examples help demonstrate the flexibility that is provided by this plugin.