4. Tutorials

Each one of these tutorials are for the Intel platform.

4.1. Disassembling the current address

This is an introductory tutorial which will show an example of a multicased function as well as exposing the user to aliases.

  1. Most multicased functions have a variation that takes no parameters in order to imply the current address. Another way to do this, however, is to use database.here() which is aliased as database.h. This function is called so often, however, that its been imported into the default namespace of this plugin. So we can literally just call h().
> print hex(h())
  1. To disassemble this instruction, we’ll rely on the database.disassemble() function that is aliased as database.disasm.
> print db.disasm(h())
  1. As mentioned before, most multicased functions have a no-parameter variation which refers to the current address. Therefore another way of accomplishing this can be the following.
> print db.disasm()

4.2. Setting breakpoints (WinDbg) on all labels in a function

This is an easy tutorial that will discuss tagging. Tags will be used to filtering functions for specific instruction types, identifying operands, and eventually generating breakpoints that can be imported into WinDbg.

  1. We’ll start with by navigating to a function in IDA. Let’s just store its address for now.
> f = func.address()
  1. We’ll need to identify an address that has a label. We can use database.type.has_label() for that. Let’s navigate to a label and make sure it works (of course it works).
> print db.t.has_label()
True
>
  1. Now we can iterate through all of the function’s chunks while looking for a label. Instead of using a list comprehension, (which are incomprehensible to most), let’s just use a straight up for-loop.
> result = []
> for ea in func.iterate(f):
      if db.t.has_label(ea):
          result.append(ea)
      continue
>
  1. This is easy enough, but there’s a better way using tagging. By using tagging, we can keep navigating to functions that we want to collect labels in and then aggregate them for later. To grab the label, it’s simply a name that we can grab with database.name(). So let’s tag up each label with the key “labels_to_get”.
> for ea in func.iterate(f):
      if db.t.has_label(ea):
          db.tag(ea, 'labels_to_get', db.name(ea))
      continue
>
  1. So now, we can do this to any function we want and you’ll notice that each address with a label now includes a tag as its comment. Now we can output something to paste into WinDbg (or actually write to a file that we can then use $$< to execute).
> for res in db.selectcontents('labels_to_get'):
      for ea, res in func.select(*res):
          print r'bp %x ".printf \"Hit label %s\n\""'%( ea, res['labels_to_get'] )
      continue
>
  1. So now we’ve outputted a list of breakpoints to feed into our debugger. This will only work if the base address of our database matches our image base in our debugger. But…we can actually just feed an offset to our debugger instead. This will allow our breakpoint to be independent of our base address. To get our module name, we can use database.module(), and to convert our address to an offset, we can use database.offset().
> for res in db.selectcontents('labels_to_get'):
      for ea, res in func.select(*res):
          print r'bp %s+%x ".printf \"Hit label %s\n\""'%( db.module(), db.offset(ea), res['labels_to_get'])
      continue
>
  1. And now we have some breakpoints that output the label they execute.

4.3. Tagging all dynamic calls in the database

Similar to above, we will use tags to mark all the dynamic calls in the database. This is a medium difficulty tutorial that will also touch on tag importing an exporting.

When we’re done, we’ll also remove the tags we’ve created to avoid cluttering things up. Let’s pretend we’re looking at Delphi and we want to identify all functions that allocate something and tag all of the dynamic calls within them.

First we’ll need to enumerate all the functions that we care about. We can do that via database.functions.list() and then use database.functions.iterate() to select a subset of them, or we can just iterate through everything in the database via database.functions().

  1. To start out, let’s assume that we have most of the System package already named. So, let’s look for functions within that package that do stuff related to memory.
> db.functions.list('System.*Memory*')
...
>
  1. We turned up some results, so let’s assume that we like them. Now we can use database.functions.iterate() to iterate through our results and then use function.tag() to tag them for later.
> for ea in db.functions.iterate('System.*Memory*'):
      func.tag(ea, 'is-memory-function', 1)
>
  1. Let’s expand our search a little bit by also tagging the callers of these functions. This can be done by using function.up(). Our tag name “is-memory-function” doesn’t make sense, so we’ll tag the callers with “calls-memory-function”.
> for ea, res in db.select('is-memory-function'):
      for ea in func.up(ea):
          func.tag(ea, 'calls-memory-function', 1)
      continue
>
  1. Now that we have all of our functions tagged with “is-memory-function”, or “calls-memory-function. These can both be queried database.select() to select them. Since we’re searching for either tag (or), we’ll use the Or parameter to return any function that has either tag assigned. We plan on iterating through these results, so we’ll need to use function.chunks.iterate() (or really its alias function.iterate) to look for our instruction type. To test for an indirect call instruction (a call which branches to a register or a phrase), we can simply use the instruction.is_calli() function.
> for ea, res in db.select(Or=('is-memory-function', 'calls-memory-function')):
      for ea in func.iterate(ea):
          if ins.is_calli(ea):
              db.tag(ea, 'indirect-call', 1)
          continue
      continue
>
  1. Just to keep our function comments clean, let’s untag both the function tags that we applied. Since we tagged the contents of these functions with the tag “indirect-call”, querying for this contents tag will end up giving us the subset of the results we care about.
> for ea, res in db.select(Or=('is-memory-function', 'calls-memory-function')):
      for tagname, value in res.iteritems():
          oldvalue = func.tag(ea, tagname, None)
          print "Removing tag %s from function %x: %s"% (tagname, ea, func.name(ea))
      continue
>
  1. After cleaning up, now we should have the actual dynamic call instructions tagged in the contents of our functions. So to continue, let’s tag the operand type for each instruction. This way we can determine which registers the instructions’ operands are composed of. We can do this using instruction.op_type() which is aliased as instruction.opt. Actually, in order to check our results, let’s actually store all of the operand types using its plural form, instruction.ops_type. As usual, this has an abbreviated alias of instruction.opts(). We’ll also keep things clean again, by removing the previous tag, “indirect-call”.
> for ea, res in db.selectcontents('indirect-call'):
      for ea, res in func.select(ea, *res):
          print "Tagging address %x with %d operands"% (ea, ins.ops_count(ea))
          db.tag(ea, 'call-optypes', ins.opts(ea))
          print "Removing old \"%s\" tag from %x"% ('indirect-call', ea)
          db.tag(ea, 'indirect-call', None)
      continue
>
  1. Just to sanity check things, lets prove that all of the calls that we care about really only have one operand. To do this, we’ll output their address using the database.disassemble() function which is aliased as database.disasm and also tag them so we can refer to them later. We’ll do this removal by passing the None parameter to database.tag().
> for res in db.selectcontents('call-optypes'):
      for ea, res in func.select(*res):
          if len(res['call-optypes']) != 1:
              print "Unknown operand count %d for instruction: %s"% (len(res['call-optypes']), db.disasm(ea))
              db.tag(ea, 'calli-unknown', 1)
              print "Removing old tag \"%s\" from %x"% ('call-optypes', ea)
              db.tag(ea, 'calli-optypes', None)
          continue
      continue
>
  1. Now if we want, we can manually go through all of the “calli-unknown” contents tags and figure out what is odd about them. But, we’re really only interested in the registers for the first operand. To decode the first operand, we can use instruction.op_value() which is aliased as instruction.op(). Now operands that are composed of registers (or symbols) inherit from the symbol_t type. This type has a symbols property which will allow one to enumerate the symbols (really registers) belonging to an operand. So, let’s go ahead and identify our “call-optypes” instructions again, and create a new tag, “call-opregs”. This new tag will contain all of the registers we need to resolve the target address of the branch instructions that we’ve selected.
> for res in db.selectcontents('call-optypes'):
      for ea, res in func.select(*res):
          op = ins.op(ea, 0)
          regnames = []
          for symbol in op.symbols:
              regnames.append(symbol.name)
          print "Tagging %x with %s containing the regs %r"% (ea, 'call-opregs', regnames)
          db.tag(ea, 'call-opregs', regnames)
      continue
>
  1. Tagging these registers for each call instruction is actually going to be useful to pass along to a debugger. With this we know which register to dump for a call instruction in order to calculate its target. Instead of calculating them though, let’s remain hacky and just output their results as a breakpoint. In the prior tutorial, we chose database.offset() in order to calculate the relative address. Instead of doing it that way, there’s a class in the tools module that we can use to transform an address named tools.remote. So let’s use this instead. To construct this class, we’ll need to pass our remote base address as a parameter.
> R = tools.remote(remote_base_address)
> print hex( R.get(h()) )
  1. Now that we have an instance of tools.remote, we can select our instructions tagged with “call-opregs” and produce a breakpoint for each one. Let’s do that.
> for res in db.selectcontents('call-opregs'):
      for ea, res in func.select(*res):
          emit_registers = ''
          for regname in res['call-opregs']:
              emit_registers += "r @%s;"% regname

          # "put" our address into the debugger
          remote_ea = R.put(ea)
          print r'bp %x ".printf \"Hit call %s\n\";%s;g"'% (remote_ea, db.disasm(ea), emit_registers)
      continue
>
  1. And now we’ve just outputted some breakpoints that we can feed into WinDbg which will emit the values of any registers that are required to branch via a call instruction. Let’s redo this because we might want to save these breakpoints for later. We’ll take the breakpoint that we generated for each instruction, and then store is via the tag “break-calli”.
> for res in db.selectcontents('call-opregs'):
      for ea, res in func.select(*res):
          emit_registers = ''
          for regname in res['call-opregs']:
              emit_registers += "r @%s;"% regname
          bpstr = r'.printf "Hit call %s\n";%s;gc'% (db.disasm(ea), emit_registers)
          db.tag(ea, 'break-calli', bpstr)
      continue
>
  1. Now that we have the breakpoints stored, the next time we open this database we should be able to generate the breakpoints for WinDbg at time that we need them. This data can also be shared with other users so that they will also have the access to the same information. Just for fun, let’s serialize this data so that we can transport this to another user. Rather than writing the queries to do this manually, we can utilise one of the functions provided by the tools.tags module. Namely the tools.tags.export. We only want to give them the “break-calli” tags which can be exported via the following code.
> data = tools.tags.export('break-calli')
>
> import pickle, os.path
> filename = os.path.join(db.path(), 'breakpoints.pickle')
> with file(filename, 'wb') as output:
      pickle.dump(data, output)
>
> print "Dumped breakpoints to %s"% filename
  1. Again, if a user wants to import these pickled tags into their database, then can simply unpickle the object and then use the functionality available within the tools.tags module to do this.
> filename = os.path.join(db.path(), 'breakpoints.pickle')
> with file(filename, 'rb') as input:
      data = pickle.load(input)
>
> tools.tags.apply(data)
  1. Unfortunately, this will overwrite any tags in the current database with the name “break-calli”. If the user wants to map these tags to a different name, however, they can provide a dictionary of tag mappings as a keyword parameter to tools.tags.apply.
> tools.tags.apply(data, **{'break-calli': 'username.break-calli'})
>

4.4. Marking all functions that are “leaves”

This tutorial is somewhat “advanced”. Other than using tags as described in the prior tutorials, this will also discuss ways to use the combinators provided by this plugin.

  1. Knowing whether a function is a utility function that doesn’t call anything might reduce the time it takes a reverser to determine the complexity of a function. This plugin makes it pretty easy to do this thanks to the help of functions like function.down() or the combination of function.chunks.iterate() and instruction.is_call(). So, let’s use these tools to define a function that returns whether a function calls other functions or not.
> def has_children(ea):
      if len(func.down()) > 0:
          return True
      return False
>
> print has_children(h())
13
>
  1. One issue with using function.down() is since it only returns addresses that a function is capable of calling, it will still return False if the function we apply it to makes an indirect call. Let’s improve this by looking for any call via the following variation.
> def has_children(ea):
      res = []
      for ea in func.iterate(ea):
          if ins.is_call(ea):
              res.append(ea)
          continue
      return True if len(res) > 0 else False
>
  1. Another way to do this is via the combination of an anonymous function (lambda) and a list comprehension. This would look like the following code.
> has_children = lambda ea: True if len([ea for ea in func.iterate(ea) if ins.is_call(ea)]) > 0 else False
  1. Yet another way involves using the functional combinator component of this plugin (see Functional combinators). To assist with these types of one-liners, this plugin includes a number of combinators that can be combined to build the exact same function. If we combine the fpartial(), ifilter(), and some operators available via Python’s operator module with the fcompose() combinator we can implement our prior 2 implementations of the has_children() function with the following code.
> print "first we need to iterate through all addresses in function"
> func_iterator = func.iterate
>
> print "now we'll filter for all call instructions"
> func_callFilter = fcompose(func_iterator, fpartial(ifilter, ins.is_call))
>
> print "now we'll convert our ifilter into a list so we can count them"
> func_callLister = fcompose(func_callFilter, list)
>
> print "convert our list of call instructions into a count"
> func_callCounter = fcompose(func_callLister, len)
>
> print "now we want to return true if operator.lt(0, len(list( call_instructions )))"
> func_callComparison = fcompose(func_callCounter, fpartial(operator.lt, 0))
>
> print "this will now return true is the number of call instructions is > 0"
> has_children = func_callComparison
>
> print 'combined we have'
> has_children = fcompose(func.iterate, fpartial(ifilter, ins.is_call), list, len, fpartial(operator.lt, 0))
  1. The combination of these primitives can provide some potentially very powerful tools if a user chooses to use this method. Nonetheless, it is up to the user and their own personal preference. This function that we’ve created, has_children(), will now be used to tag all of the functions that have no children. To start out, however, let’s create another function that will tag a function with the tag “function-type” and the value “leaf” if has_children() returns False.
> def tag_if_leaf(ea):
      if has_children(ea):
          func.tag(ea, 'function-type', 'leaf')
      return
>
  1. In the prior tutorials we used the database.functions namespace to enumerate each function. In this case we’ll use another useful function in that is provided to us by the tools module. This functions is tools.map() and takes a callable as its first parameter. Normally, this callable will be passed an address for each function within the database. This callable will then be executed against every function similar to using database.functions(). One thing that is interesting about tools.map(), however, is that it has the ability to detect the type of callable that is passed to it. If the callable takes two parameters, it will assume that the user intended an index, and an address to be passed to it. This can be used to detect how far along tools.map() has processed. Let’s redefine the above tag_if_leaf() function again.
> def tag_if_leaf(index, ea, **kwargs):
      total = kwargs['total']
      print "Percentage complete: %f"% (index / float(total))
      if not has_children(ea):
          func.tag(ea, 'function-type', 'leaf')
          return (ea, False)
      return (ea, True)
>
  1. Now that we have a callable to pass to tools.map(), we can simply hand it our callable and proceses the entire database.
> total = len(db.functions())
> res = tools.map(tag_if_leaf, total=len(db.functions()))
  1. Now, not only are all the “leaf” functions tagged, the variable res contains a list of tuples containing each function’s address, and whether or not it has any children. Let’s convert this to a Python dict and tag any functions that contain only indirect calls. This way we can distinguish if any of these functions are wrappers that use virtual methods.
> children_lookup = dict(res)
> for ea in children_lookup:
      if children_lookup[ea] and len(func.down(ea)) == 0:
          func.tag(ea, 'function-type', 'virtual-wrapper')
      continue
>
  1. Now each function containing a call is tagged with “function-type” being equivalent to “leaf” or “virtual-wrapper” depending on whether no functions are called, or only indirect calls are made.

4.5. Using the user-interface (ui) module to map hotkeys to a function

This plugin wraps a number of the capabilities that are exposed by IDAPython. One of these capabilities are to be able to interact with IDA’s user-interface. This is exposed by the plugin via the ui module. This tutorial introduces some of the user-interface wrappers that are available and can be used to develop quick tools that the user needs as they’re reversing a target.

  1. When doing iterative changes to the different content within a database, it is common to perform actions such as apply a structure, convert a list of data items into an array, convert a range of bytes into a string, and so forth. As an example, when reversing C++ targets where runtime-type-information is available (RTTI), information about a class such as its size, number of virtual methods, etc. are available, but not packed into a structure by IDA. Most of this is done relative to whatever object the user is currently working with. The current object the user is working with can be identifed with the ui.current namespace. This can be combined with other functionality within the plugin in order to automate a large aspect of the reverser’s chores.
> help(ui.current)
Help on class current in module ui:

class current(__builtin__.object)
 |  This namespace contains tools for fetching information about the
 |  current selection state. This can be used to get the state of
 |  thigns that are currently selected such as the address, function,
 |  segment, clipboard, widget, or even the current window in use.
 |
 |  Class methods defined here:
 |
 |  address(cls) from __builtin__.type
 |      Return the current address.
 |
 |  color(cls) from __builtin__.type
 |      Return the color of the current item.
 |
 |  function(cls) from __builtin__.type
 |      Return the current function.
 |
 |  opnum(cls) from __builtin__.type
 |      Return the currently selected operand number.
 |
 |  segment(cls) from __builtin__.type
 |      Return the current segment.
 |
 |  selected = selection(*arguments, **keywords) from __builtin__.type
 |      Alias for `ui.current.selection`.
 |
 |  selection(cls) from __builtin__.type
 |      Return the current address range of whatever is selected
 |
 |  status(cls) from __builtin__.type
 |      Return the IDA status.
 |
 |  symbol(cls) from __builtin__.type
 |      Return the current highlighted symbol name.
 |
 |  widget(cls) from __builtin__.type
 |      Return the current widget that the mouse is hovering over.
 |
 |  window(cls) from __builtin__.type
 |      Return the current window that is being used.
  1. When an RTTI entry is identified by IDA, it looks like the following unstructured data. With a cross-reference existing at address 0x723324. This data isn’t presently of use to a reverser, and it would be much easier to use if it was structured. Normally the reverser would need to navigate to the “Structures” view, create the structure, and then use ‘Alt+Q’ to apply the structure to the address at 0x723308.
.rdata:00723308 BC 34 72 00           off_723308      dd offset str.CNxPrefDigSigDlg
.rdata:0072330C 68                                    db  68h ; h
.rdata:0072330D 01                                    db    1
.rdata:0072330E 00                                    db    0
.rdata:0072330F 00                                    db    0
.rdata:00723310 FF                                    db 0FFh ; ÿ
.rdata:00723311 FF                                    db 0FFh ; ÿ
.rdata:00723312 00                                    db    0
.rdata:00723313 00                                    db    0
.rdata:00723314 00                                    db    0
.rdata:00723315 00                                    db    0
.rdata:00723316 00                                    db    0
.rdata:00723317 00                                    db    0
.rdata:00723318 E0 55 42 00                           dd offset sub_4255E0
.rdata:0072331C 00                                    db    0
.rdata:0072331D 00                                    db    0
.rdata:0072331E 00                                    db    0
.rdata:0072331F 00                                    db    0
.rdata:00723320 00                                    db    0
.rdata:00723321 00                                    db    0
.rdata:00723322 00                                    db    0
.rdata:00723323 00                                    db    0
.rdata:00723324 C8 7C 7A 00                           dd offset const CNxPrefDigSigDlg::`RTTI Complete Object Locator'
  1. To deal with this, we will first create the structure at the IDAPython command prompt based on what we know. Depending on the C++ implementation, this structure can be named CRuntimeClass, begins with a pointer to a string, contains a pointer to the constructor, and ends with a reference to the object locator. We will use struc.new to create it and add some members.
> st = struc.new('CRuntimeClass')
> st.members.add('p_name_0', int)
> st.members.add('v_size_4', int)
> st.members.add('v_8', int)
> st.members.add('pf_constructor_c', int)
> st.members.add('p_parentClass_10', int)
> st.members.add('p_14', int)
> st.members.add('p_18', int)
> st.members.add('p_runtimeTypeInformation_1c', int)
  1. Now that we have a structure stored in the st variable, we can use db.set.structure to apply it to an address. After applying the structure, we can then name it with db.name, or even tag it using the db.tag function in order to search for it later. As db.set.structure returns the structure that was created, we will use this combined with db.get.string to create the new name for the address.
> def make_rtti(ea):
      res = db.set.structure(ea, st)
      name_address = res['p_name_0'].value
      name = db.get.string(name_address)
      # database.name(ea, 'gv', 'rtti', name, db.offset(ea))
      database.name(ea, 'rtti' + '_' + name)
      return res
  1. Now we can use this function, whilst passing the target address as a parameter, to automatically apply the structure to the given address. One issue with this, however, is that typically when dealing with a CRuntimeClass, the cross-reference that navigates you to the class will point to the p_runtimeTypeInformation_1c field which will require you to seek by -0x1c bytes in order to apply the structure. We can remedy this by assuming IDA will label the string, and use the db.a.prevlabel function to get an address pointing to the previously defined label.
> def make_rtti_from_reference(ea):
      name_address = db.a.prevlabel(ea)
      return make_rtti(name_address)
  1. Now that we have a way to apply the structure to the address identified by the previously labeled name, we can simply iterate through all the cross-references pointing to the “RTTI Object Locator”, and apply this function to it. However, what if we’re unsure that every single cross-reference has this particular format? If that’s turns out to be the case, then it might be better to do this on command instead of programmatically. We’ll first define a function that uses the current address via either the ui.current.address() function, or the h function.
> def apply_rtti_here():
      ea = h()
      res = make_rtti_from_reference(ea)
      constructor = res['pf_constructor_c'].value
      db.tag(ea, 'rtti.constructor', constructor)
  1. Now we have a function that when executed, will use the current address, and apply our structure to that given address. One issue with this is the need to type the function into IDAPython in order to execute it. In order to work around this, we can use the ui.keyboard.map() function to map it to a hotkey. We will use the ‘Ctrl+P’ hotkey.
> ui.keyboard.map('Ctrl-P', apply_rtti_here)
<capsule object "$valid$" at 0x7fe7812b51e0>
  1. At this point, we can simply navigate to a cross-reference pointing in front of a label that we want to convert to a structure, and then hit the ‘Ctrl+P’ hotkey to execute our function. This allows us to automate some aspect of our reversing in order to annotate the database belonging to the target. When we’re done, we can simply use the ui.keyboard.unmap() function to unmap our hotkey to restore IDA’s original functionality.
> ui.keyboard.unmap('Ctrl-P')
  1. This plugin includes a number of ways to interact with IDA’s user-interface. Please review the help for the ui module for some of its other capabilities.

4.6. Conclusion

There are a variety of different features available in this plugin that can allow users to automate different aspects of their reverse-engineering project. It is recommended by the author to explore the different modules by using Python’s help() function to see what is available.

This plugin was written with the intention of enabling a reverse-engineer to automate many issues that one may encounter while reversing without investing in too much development effort. The author hopes that these examples help demonstrate the flexibility that is provided by this plugin.