Deconstructing Word

Since Microsoft Word is being used more and more as the vehicle for infecting computers I decided to look into how to validate Word files (DOC) at the email gateway and reject those deemed undesired.

I am using Linux, Postfix, Amavis, Clam-AV, and SpamAssassin on the inbound email gateways.

I acquired a Detect-VBA perl script and was able to add it as a virus scanner in Amavis. This script was set to detect any macros in a Word file and declare the file as infected regardless of the nature of the macro. The script worked.

So I attempted to make a script to validate the Word file attachment. It failed when added to Amavis as a virus scanner. This failure was due to the way Amavis saves the attachments. So I had to do some research.

In the process of documenting the inbound email flow for a client I ended up reading quite a bit and decided I needed to change some of the gateway’s processing. Amavis is used in a post-queue processing configuration where I am responsible for message disposition notification. I wanted to move as much of this into the pre-queue processing as possible so as to make the sender responsible for message disposition notification.

I copied Amavis’s file extension list and placed it in Postfix’s header_checks. This works very fast but also has a few limitations, such as no scanning within archive files or double extension checks. This now pushes much of the “banned file” processing back onto the sender’s server for processing rather than me getting a notification.

The extension list I am blocking:

ade|adp|app|bas|bat|chm|cmd|com|cpl|crt|docm|exe|fxp|grp|hlp|hta|
inf|ins|iso|isp|js|jse|lnk|mda|mdb|mde|mdw|mdt|mdz|msc|msi|msp|mst|
ops|pcd|pif|prg|reg|scr|sct|shb|shs|vb|vbe|vbs|wsc|wsf|wsh|xlsm

You may notice that I am blocking the Microsoft Office files that are declared to have macros (DOCM, XLSM). This is due to the belief for this client that macros have little use in the business, especially via email.

The detect-vba.pl script, slightly tweaked for this client’s setup:

#!/usr/bin/perl -w
#technion@ lolware.net
# Detects vba macros containing blacklisted strings.
#
# Suggested maiad.conf config: 
#['Detect-VBA', #'/var/maia/scripts/detectvba.pl', "{}", 
# [0], qr / INFECTED / , qr / \bINFECTED(. + )\ b / m], 
#
use strict;
my $sigtool = '/usr/bin/sigtool'; #Clamav sigtool path

if ($# ARGV != 0) {
    print“ Please supply directory to scan\ n”;
    exit 0;
}
#Sanity check directory
my $dir = $ARGV[0];
if ($dir!~/^[a-z0-9A-Z\/-]+$/) {
    print“ Invalid directory passed\ n”;
    exit 0;
}
opendir DIR, $dir or die“ Cannot open dir $dir: $!”;
my@ files = readdir DIR;
foreach my $file(@files) {
    next
    if $file = ~/^\.$/;
    next
    if $file = ~/^\.\.$/;
    my $scan = `$sigtool –vba=”$dir/$file”`;
    if ($scan = ~/start of code/i) {
        print“ Scanning $file: INFECTED VBA\ n”;
        exit 1;
    } else {
        print“ Scanning $file: OK\ n”;
    }
}
closedir DIR;
exit 0;

This script depends on Clam-AV’s sigtool to extract the VBA portion of the file. Due to the nature of things this script is run against every attachment needed or not.

So that took care of any office files that had macros so long as Clam-AV was able to extract them.

So I then moved on to validating word file attachments. I wanted to check if that word file was really a word file.
This proved to be almost impossible via Amavis’s virus scanner option. I did write a script based on ideas I found in OLEMACRO SpamAssassin plugin and a few other places. Here’s what I came up with:

#!/usr/bin/perl

#----------------------------------------------------------
# Verify Word file is really a word file
#
# Andrew Haines <ahaines@sympatico.ca>
# Copyright 2018
#----------------------------------------------------------
# portions borrowed from detectvba.pl
# portions borrowed from File::Type
#----------------------------------------------------------
#
#Suggested amavisd.conf config:
#['Detect-Valid-Word-File',
# '/usr/local/bin/DetectValidWordFile.pl', "{}",
# [0], qr/FALSE/, qr/UNKNOWN/ ],
#
#----------------------------------------------------------

use strict;
use IO::File;

if ($#ARGV != 0) {
  print "Please supply directory to scan\n";
  exit 2;
}

#Sanity check directory
my $dir = $ARGV[0];
if ($dir !~ /^[a-z0-9A-Z\/-]+$/) {
  print "Invalid directory passed\n";
  exit 3;
}

opendir DIR, $dir or die "Cannot open dir $dir: $!";
my @files = readdir DIR;
my $cf = scalar @files;
my $hasbug = 0;

foreach my $file (@files) {
  #next if $file =~ /^\.$/;
  #next if $file =~ /^\.\.$/;
  next if $file !~ /\.doc$/i;

  my $fh = IO::File->new("$dir/$file") || die "unable to open file $file: $!";
  my $data;
  $fh->read($data, 16);
  $fh->close;
  my $t = 0;

  # OLE Compound File
  if ($data =~ m[^\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1]) {
    print "Scanning $file: OK\n";
    if ($cf<4) {exit 0;}
  } else {
    # DOCX or DOCM file
    if ($data =~ m[^PK\003\004]) {
      print "Scanning $file: FALSE (Should be DOCX or DOCM)\n";
      if ($cf<4) {exit 1;} else {$t = 1;}
    }

    # RTF File
    if ($data =~ m[^\x7b\x5c\x72\x74\x66]) {
      print "Scanning $file: FALSE (Should be RTF)\n";
      if ($cf<4) {exit 1;} else {$t = 1;}
    }

    # something else?
    if ($t != 1) {
      print "Scanning $file: UNKNOWN\n";}
    if ($cf<4) {exit 1;} else {$t = 1;}
  }
  if ($t == 1) {$hasbug = 1;}
}

closedir DIR;
if ($hasbug == 1) {exit 1;}
exit 0;

The script looks at the first 16 bytes of the file to determine its type. If it is really a Word DOC file then it should be an OLE Compound File. If it has been renamed then class it as a virus (i.e. DOCX, DOCM, RTF, HTML).
While the idea is sound, Amavis does not use the attachment name when it saves the attachment to disk so the script never actually checked the file.

While trying to solve a different issue with Outlook users and RTF (a.k.a. winmail.dat) I came across the mimedefang milter. Again quite a bit of reading was involved. That particular configuration just used mimedefang on an internal server to split attachments out of any winmail.dat file. It did however get me researching. Since I already have perl installed on all my gateways mimedefang would make a fairly easy change. So I started looking for examples where people looked at word files, only to find very little. So off to build it myself.

I did find some python scripts (i.e. oletools) that would assist with dissecting word files.
I found perl’s OLE::Storage_Lite module. I grabbed the documentation Microsoft published.
I did not find a perl implementation of the RLE compression used within Microsoft Office files so I converted the function from python (from oletools in this case).
After working through a few sample files (mostly samples of “bugs” received via email) I came up with the following script to deconstruct a Word file:

#!/usr/bin/perl

use strict;
use warnings;
use OLE::Storage_Lite;
use List::Util qw[min max];
use POSIX qw/ceil log2/;
use Data::Dumper;

my @files = (
        'YGTCQ3625195.doc'

#        'attachment (2).doc'
#        'GHZXC5725702.doc',
#        'YGTCQ3625195.doc',
#        'QSPMacroDefinitions.xls',
#        'integration.doc'
);
my $decompressed_container = "";
my $dir_container = "";
my $ThisDocument_Stream = "";
my $ThisDocument_SourceCode = "";

my $SysKindRecord;
my $LcidRecord;
my $LcidInvokeRecord;
my $CodePageRecord;
my $NameRecord;
my $DocStringRecord;
my $HelpFilePathRecord;
my $HelpContextRecord;
my $LibFlagsRecord;
my $VersionRecord;
my $ConstantsRecord;
my @ProjectReferences;
my @ModuleRecords;
my $ThisDocumentOffset = 0;
my %ModuleOffsets;
my %ModuleStreams;
my %ModuleSource;


for my $filename ( @files ) {

    printf( "%-20s = %s\n", $filename, dump_ole_filetype( $filename ) );

}

sub dump_ole_filetype {

    my $filename = shift;
    my $level = 1;
    my $bData = 1;

    # Check that the file exists.
    return 'not_found' if !-e $filename;

    # Create an OLE::Storage_Lite object to read the file.
    my $ole      = OLE::Storage_Lite->new( $filename );
    my $pps      = $ole->getPpsTree($bData);

    # If getPpsTree() failed then this isn't an OLE file.
    return 'not_ole_file' if !$pps;

    # Loop through the PPS children below the root.
    for my $child_pps ( @{ $pps->{Child} } ) {

        my $pps_no   = $child_pps->{No};
        my $pps_name = OLE::Storage_Lite::Ucs2Asc( $child_pps->{Name} );
        my $pps_type = $child_pps->{Type};

        printf("    %3d %2d %s\n", $pps_no, $pps_type, $pps_name );

        if ($pps_type == 1) {dump_child_dir ($child_pps, $level+1);}
    }

    return '';
}

sub dump_child_dir {
    my ($pps, $level) = @_;
    my $textbuf;
    my $c_rle;
    my %levels = ( 1 => '',
                   2 => '       ',
                   3 => '             '
                 );

    for my $child_pps ( @{ $pps->{Child} } ) {

        my $pps_no   = $child_pps->{No};
        my $pps_name = OLE::Storage_Lite::Ucs2Asc( $child_pps->{Name} );
        my $pps_type = $child_pps->{Type};

        printf(" %2d %s  | -- %3d %2d %s\n", $level, $levels{$level-1}, $pps_no, $pps_type, $pps_name );

        if ($pps_type == 1) {dump_child_dir ($child_pps, $level+1);}

        if ($pps_type == 2) {
            if ($pps_name eq 'dir') {
                # dir stream is compressed so must decompress it first then parse it
                $dir_container = decompress_rle_stream ($child_pps, 0, 0);
                parse_dir_stream( $dir_container );
                next;
            }
            if ($pps_name eq 'ThisDocument') {
                $ThisDocument_SourceCode = decompress_rle_stream( $child_pps, $ThisDocumentOffset, 1 );
                print $ThisDocument_SourceCode . "\n";
                next;
            }
            if ($pps_name eq '_VBA_PROJECT') {next;}
            if ($pps_name eq 'PROJECT') {next;}
            if ($pps_name eq 'PROJECTwm') {next;}
            if ($pps_name =~ /^__SRP_/) {next;}

            if($ModuleOffsets{$pps_name} != 0) {
              printf("processing module %s with offset %u\n", $pps_name, $ModuleOffsets{$pps_name} );
              $ModuleSource{$pps_name} = decompress_rle_stream( $child_pps, $ModuleOffsets{$pps_name}, 1 );
              print $ModuleSource{$pps_name} . "\n";
            }

        }

    }
}

sub decompress_rle_stream ($$$) {
    my $pps = shift;
    my $offset = shift;
    my $dump = shift;
    my $container = "";
    
    my $buf .= $pps->{Data};
    my $buf_len = length($buf);
    printf("PPS size = %s\n", $pps->{Size});
    printf("buffer length = %d\n", $buf_len);

    if($dump == 1) {
      # dump the buffer for debugging
      my $tempbufptr = 0;
      for ($tempbufptr = 0; $tempbufptr <= $buf_len; $tempbufptr+=32) {
        printf("buf (%0.4u) = %*v2.2x\n", $tempbufptr, ' ', substr($buf, $tempbufptr, 32));
      }
    }

    my $compressed_current = $offset;
    my $compressed_chunk_start = 0;
    my $decompressed_chunk_start = 0;
    my $compressed_chunk_header = 0;
    my $chunk_size = 0;
    my $chunk_is_compressed = 0;
    my $compressed_end = 0;
    my $flag_byte = 0;
    my $bit_index = 0;
    my $copy_token;

    my $sig_byte = ord(substr($buf, $compressed_current, 1));

    if ($sig_byte != 0x01) {
       printf("Invalid signature byte: %02x\n", $sig_byte);
       return (1, undef);
    }
    $compressed_current += 1;

    while ( $compressed_current < length($buf) ) {
        $compressed_chunk_start = $compressed_current;
        $compressed_chunk_header = unpack("v", substr($buf, $compressed_current, 2));

        $chunk_size = ($compressed_chunk_header & 0x0FFF) + 3;
        $chunk_is_compressed = ($compressed_chunk_header & 0x8000) >> 15; # 1 == compressed, 0 == uncompressed

        $compressed_end = min($buf_len, $compressed_current + $chunk_size);
        $compressed_current += 2;

        if ($chunk_is_compressed == 0) { # uncompressed
            $container .= substr($buf, $compressed_current, $compressed_current + 4096);
            $compressed_current += 4096;
            next;
        } else {
          $decompressed_chunk_start = length($container);
          while ($compressed_current < $compressed_end) {
            $flag_byte = ord(substr($buf, $compressed_current, 1));
            $compressed_current += 1;

            for ($bit_index = 0; $bit_index < 8; $bit_index++) {
                if ($compressed_current >= $compressed_end) {
                    last;
                }
                if (((1 << $bit_index) & $flag_byte) == 0) { # LiteralToken
                    $container .= substr($buf, $compressed_current, 1);
                    $compressed_current += 1;
                    next;
                }

                #
                # copy tokens
                #

                $copy_token = unpack("v", substr($buf, $compressed_current, 2));
                my($length_mask, $offset_mask, $bit_count, $maximum_length) = copytoken_help(length($container) - $decompressed_chunk_start);
                my $tlength = ($copy_token & $length_mask) + 3;
                my $temp1 = $copy_token & $offset_mask;
                my $temp2 = 16 - $bit_count;
                my $offset = ($temp1 >> $temp2) + 1;
                my $copy_source = length($container) - $offset;
                $container .= substr($container, $copy_source, $tlength);
                $compressed_current += 2;

             }
          }
        }

    }

    return $container;
}

sub copytoken_help($) {
    my $difference = shift;

    my $bit_count = int(ceil(log2($difference)));
    $bit_count = max($bit_count, 4);
    my $length_mask = 0xFFFF >> $bit_count;
    my $offset_mask = ~$length_mask;
    my $maximum_length = (0xFFFF >> $bit_count) + 3;
    return ($length_mask, $offset_mask, $bit_count, $maximum_length);
}


sub parse_dir_stream ($) {
    my($buf) = shift;
    my $bufptr = 0;
    my $tempbufptr = 0;
    my %ModuleRecord = ();
    my $buf_len = length($buf);
    my $modulestartptr;

    printf("dir stream size = %u\n", length($buf));

    # Top Level Structure of "dir" stream:
    # -------------------------------------------------
    # InformationRecord (variable) = PROJECTINFORMATION
    # ReferencesRecord (variable)  = PROJECTREFERENCES
    # ModulesRecord (variable)     = PROJECTMODULES
    # Terminator (2 bytes) = 0x0010
    # Reserved (4 bytes) = 0x00000000

    # PROJECTINFORMATION record:
    # -------------------------------------------------
    # SysKindRecord (10 bytes)     = PROJECTSYSKIND
    # LcidRecord (10 bytes)        = PROJECTLCID
    # LcidInvokeRecord (10 bytes)  = PROJECTLCIDINVOKE
    # CodePageRecord (8 bytes)     = PROJECTCODEPAGE
    # NameRecord (variable)        = PROJECTNAME
    # DocStringRecord (variable)   = PROJECTDOCSTRING
    # HelpFilePathRecord (variable)= PROJECTHELPFILEPATH
    # HelpContextRecord (10 bytes) = PROJECTHELPCONTEXT
    # LibFlagsRecord (10 bytes)    = PROJECTLIBFLAGS
    # VersionRecord (12 bytes)     = PROJECTVERSION
    # ConstantsRecord (variable)   = PROJECTCONSTANTS


    # PROJECTSYSKIND record:
    # --------------------------------------------------
    # Id (2 bytes) = 0x0001
    # Size (4 bytes) = 0x00000004
    # SysKind (4 bytes)
    $SysKindRecord = substr($buf, $bufptr, 10);
    printf("SysKind Record = %*v2.2x\n", ' ', $SysKindRecord );

    # PROJECTLCID record:
    # --------------------------------------------------
    # Id (2 bytes) = 0x0002
    # Size (4 bytes) = 0x00000004
    # Lcid (4 bytes)
    $LcidRecord = substr($buf, $bufptr+10, 10);
    printf("Lcid Record = %*v2.2x\n", ' ', $LcidRecord );

    # PROJECTLCIDINVOKE record:
    # --------------------------------------------------
    # Id (2 bytes) = 0x0014
    # Size (4 bytes) = 0x00000004
    # LcidInvoke (4 bytes)
    $LcidInvokeRecord = substr($buf, $bufptr+20, 10);
    printf("LcidInvokeKind Record = %*v2.2x\n", ' ', $LcidInvokeRecord );

    # PROJECTCODEPAGE record:
    # ---------------------------------------------------
    # Id (2 bytes) = 0x0003
    # Size (4 bytes) = 0x00000002
    # CodePage (2 bytes)
    $CodePageRecord = substr($buf, $bufptr+30, 8);
    printf("CodePage Record = %*v2.2x\n", ' ', $CodePageRecord );
    my $CodePageRecord_CodePage = unpack('v', substr($CodePageRecord, 6, 2));
    printf("Code page: %d\n", $CodePageRecord_CodePage);

    # PROJECTNAME record:
    # ---------------------------------------------------
    # Id (2 bytes) = 0x0004
    # SizeOfProjectName (4 bytes) = 1-128
    # ProjectName (variable)
    $tempbufptr = $bufptr+38;
    my $NameRecord_Id = unpack('v', substr($buf, $bufptr+38, 2));
    my $NameRecord_Size = unpack('V', substr($buf, $bufptr+40, 4));
    my $NameRecord_Name = substr($buf, $bufptr+44, $NameRecord_Size);
    $bufptr = 44 + $NameRecord_Size;
    printf("Project Name = %s\n", $NameRecord_Name);
    printf("ProjectNameRecord = %*v2.2x\n", ' ', substr($buf, $tempbufptr, $bufptr-$tempbufptr));

    # PROJECTDOCSTRING record:
    # ---------------------------------------------------
    # Id (2 bytes) = 0x0005
    # SizeOfDocString (4 bytes) <= 2000
    # DocString (variable)
    # Reserved (2 bytes) = 0x0040
    # SizeOfDocStringUnicode (4 bytes) must be even
    # DocStringUnicode (variable)
    my $DocStringRecord_id = unpack('v', substr($buf, $bufptr, 2));
    my $DocStringRecord_Size1 = unpack('V', substr($buf, $bufptr+2, 4));
    my $DocStringRecord_Doc1 = "";
    if ($DocStringRecord_Size1 > 0) {
       $DocStringRecord_Doc1 = substr($buf, $bufptr+6, $DocStringRecord_Size1);
    }
    my $DocStringRecord_Reserved = unpack('v', substr($buf, $bufptr+6+$DocStringRecord_Size1, 2));
    if($DocStringRecord_Reserved != 0x0040) {printf("DocStringRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $bufptr+6+$DocStringRecord_Size1));}
    $tempbufptr = $bufptr+6+$DocStringRecord_Size1+2;
    my $DocStringRecord_Size2 = unpack('V', substr($buf, $tempbufptr, 4));
    my $DocStringRecord_Doc2 = "";
    if ($DocStringRecord_Size2 > 0) {
       $DocStringRecord_Doc2 = substr($buf, $tempbufptr+4, $DocStringRecord_Size2);
    }
    $tempbufptr = $bufptr+6+$DocStringRecord_Size1+2+4+$DocStringRecord_Size2;
    if ($DocStringRecord_Size1 > 0) { printf("DocString = %s\n", $DocStringRecord_Doc1); }
    if ($DocStringRecord_Size2 > 0) { printf("DocStringUnicode = %s\n", OLE::Storage_Lite::Ucs2Asc($DocStringRecord_Doc2)); }
    printf("DocStringRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
    $bufptr = $tempbufptr;

    # PROJECTHELPFILEPATH record:
    # ---------------------------------------------------
    # Id (2 bytes) = 0x0006
    # SizeOfHelpFile1 (4 bytes) <= 260
    # HelpFile1 (variable)
    # Reserved (2 bytes) = 0x003D
    # SizeOfHelpFile2 (4 bytes) = SizeOfHelpFile1
    # HelpFile2 (variable)
    my $HelpFilePathRecord_id = unpack('v', substr($buf, $bufptr, 2));
    my $HelpFilePathRecord_Size1 = unpack('V', substr($buf, $bufptr+2, 4));
    my $HelpFilePathRecord_File1 = "";
    if ($HelpFilePathRecord_Size1 > 0) {
       $HelpFilePathRecord_File1 = substr($buf, $bufptr+6, $HelpFilePathRecord_Size1);
    }
    my $HelpFilePathRecord_Reserved = unpack('v', substr($buf, $bufptr+6+$HelpFilePathRecord_Size1, 2));
    if($HelpFilePathRecord_Reserved != 0x003D) {printf("HelpFilePathRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $bufptr+6+$HelpFilePathRecord_Size1));}
    $tempbufptr = $bufptr+6+$HelpFilePathRecord_Size1+2;
    my $HelpFilePathRecord_Size2 = unpack('V', substr($buf, $tempbufptr, 4));
    my $HelpFilePathRecord_File2 = "";
    if ($HelpFilePathRecord_Size2 > 0) {
       $HelpFilePathRecord_File2 = substr($buf, $tempbufptr+4, $HelpFilePathRecord_Size2);
    }
    $tempbufptr = $bufptr+6+$HelpFilePathRecord_Size1+2+4+$HelpFilePathRecord_Size2;
    printf("HelpFilePathRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
    $bufptr = $tempbufptr;

    # PROJECTHELPCONTEXT record
    # ----------------------------------------------------
    # Id (2 bytes) = 0x0007
    # Size (4 bytes) = 0x00000004
    # HelpContext (4 bytes)
    $HelpContextRecord = substr($buf, $bufptr, 10);
    printf("HelpContextRecord = %*v2.2x\n", ' ', $HelpContextRecord);
    $bufptr += 10;

    # PROJECTLIBFLAGS record:
    # ----------------------------------------------------
    # Id (2 bytes) = 0x0008
    # Size (4 bytes) = 0x00000004
    # ProjectLibFlags (4 bytes)
    $LibFlagsRecord = substr($buf, $bufptr, 10);
    printf("ProjectLibFlagsRecord = %*v2.2x\n", ' ', $LibFlagsRecord);
    $bufptr += 10;

    # PROJECTVERSION record
    # -----------------------------------------------------
    # Id (2 bytes) = 0x0009
    # Reserved (4 bytes) = 0x00000004
    # VersionMajor (4 bytes)
    # VersionMinor (2 bytes)
    $VersionRecord = substr($buf, $bufptr, 12);
    printf("VersionRecord = %*v2.2x\n", ' ', $VersionRecord);
    $bufptr += 12;

    # PROJECTCONSTANTS record:
    # -----------------------------------------------------
    # Id (2 bytes) = 0x000C
    # SizeOfConstants (4 bytes) <= 1015
    # Constants (variable)
    # Reserved (2 bytes) = 0x003C
    # SizeOfConstantsUnicode (4 bytes)
    # ConstantsUnicode (variable)
    my $ConstantsRecord_id = unpack('v', substr($buf, $bufptr, 2));
    my $ConstantsRecord_Size1 = unpack('V', substr($buf, $bufptr+2, 4));
    my $ConstantsRecord_Constants = "";
    if ($ConstantsRecord_Size1 > 0) {
       $ConstantsRecord_Constants = substr($buf, $bufptr+6, $ConstantsRecord_Size1);
    }
    my $ConstantsRecord_Reserved = unpack('v', substr($buf, $bufptr+6+$ConstantsRecord_Size1, 2));
    $tempbufptr = $bufptr+6+$ConstantsRecord_Size1+2;
    my $ConstantsRecord_Size2 = unpack('V', substr($buf, $tempbufptr, 4));
    my $ConstantsRecord_Constants2 = "";
    if ($ConstantsRecord_Size2 > 0) {
      $ConstantsRecord_Constants2 = substr($buf, $tempbufptr+4, $ConstantsRecord_Size2);
    }
    $tempbufptr = $bufptr+6+$ConstantsRecord_Size1+2+4+$ConstantsRecord_Size2;
    printf("ConstantsRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
    $bufptr = $tempbufptr;


    printf("References (debug) = %*v2.2x\n", ' ', substr($buf, $bufptr, 8));
    # PROJECTREFERENCES record:
    # -----------------------------------------------------
    # ReferenceArray (variable) = REFERENCE

    # REFERENCE record:
    # -----------------------------------------------------
    # NameRecord (variable)      = REFERENCENAME (optional)
    # ReferenceRecord (variable) = ?

    # find the MODULES start
    $modulestartptr = $bufptr;
    my $fok = 1;
    my $tempid = unpack('v', substr($buf, $modulestartptr, 2));
    while(($modulestartptr < $buf_len-1) and ($fok == 1)) {
      if($tempid != 0x000F) {
        $fok = 0;
      } else {
        $modulestartptr++;
        $tempid = unpack('v', substr($buf, $modulestartptr, 2));
      }
    }

    $fok = 1; 
    # read the ID since each record has it first
    # array ends when MODULES start
    $tempid = unpack('v', substr($buf, $bufptr, 2));
    while (($tempid != 0x000F) and ($fok==1)) {
      printf("tempid = %0.4x\n", $tempid);
      $fok = 0;
      printf("debug (%u) = %*v2.2x\n", $bufptr, ' ', substr($buf, $bufptr, 48));
      
      if ($tempid == 0x0016) {
        # REFERENCENAME record:
        # -----------------------------------------------------
        # Id (2 bytes) = 0x0016
        # SizeOfName (4 bytes)
        # Name (variable)
        # Reserved (2 bytes) = 0x003E
        # SizeOfNameUnicode (4 bytes)
        # NameUnicode (variable)
        my $ismessedup = 0;
        my $noffset1 = 6;
        my $noffset2 = 4;
        # experience has shown these records to be messed up in the real-world
        # sometimes size is 4 bytes, sometimes only 2 bytes
        # sometimes the size is off

        my $ReferenceNameRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ReferenceNameRecord_Size1 = unpack('V', substr($buf, $bufptr+2, 4));
        if ($ReferenceNameRecord_Size1 > 0x00000FFF) {
           $ismessedup = 1;
           $ReferenceNameRecord_Size1 = unpack('v', substr($buf, $bufptr+2, 2));
           $noffset1 = 4;
        }
        my $ReferenceNameRecord_Name1 = "";
        if ($ReferenceNameRecord_Size1 > 0) {
           $ReferenceNameRecord_Name1 = substr($buf, $bufptr+$noffset1, $ReferenceNameRecord_Size1);
           printf("ReferenceNameRecord_Name = %s\n", $ReferenceNameRecord_Name1);
        }
        my $ReferenceNameRecord_Reserved = unpack('v', substr($buf, $bufptr+$noffset1+$ReferenceNameRecord_Size1, 2));
        $tempbufptr = $bufptr+$noffset1+$ReferenceNameRecord_Size1+2;
        if ($ReferenceNameRecord_Reserved != 0x003E) {
           # since the reserved bytes are not correct, find them
           # check the next 2 bytes
           if (unpack('v', substr($buf, $bufptr+$noffset1+$ReferenceNameRecord_Size1+2, 2)) == 0x003E) {
             $ReferenceNameRecord_Reserved = unpack('v', substr($buf, $bufptr+$noffset1+$ReferenceNameRecord_Size1+2, 2));
             $ReferenceNameRecord_Name1 = substr($buf, $bufptr+$noffset1, $ReferenceNameRecord_Size1+2); 
             $ReferenceNameRecord_Size1 += 2;
             $tempbufptr += 2;
             printf("ReferenceNameRecord_Name = %s\n", $ReferenceNameRecord_Name1);
           }
        }
        my $ReferenceNameRecord_Size2 = unpack('V', substr($buf, $tempbufptr, 4));
        if ($ReferenceNameRecord_Size2 > 0x00000FFF) {
           $ismessedup = 1;
           $ReferenceNameRecord_Size2 = unpack('v', substr($buf, $tempbufptr, 2));
           $noffset2 = 2;
        }
        my $ReferenceNameRecord_Name2 = "";
        if ($ReferenceNameRecord_Size2 > 0) {
           $ReferenceNameRecord_Name2 = substr($buf, $tempbufptr+$noffset2, $ReferenceNameRecord_Size2);
        }
        $tempbufptr = $bufptr+$noffset1+$ReferenceNameRecord_Size1+2+$noffset2+$ReferenceNameRecord_Size2;
        printf("ReferenceNameRecord size = %u, %u\n", $ReferenceNameRecord_Size1, $ReferenceNameRecord_Size2);
        printf("ReferenceNameRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
        push(@ProjectReferences, substr($buf, $bufptr, $tempbufptr-$bufptr));
        $bufptr = $tempbufptr;
        $fok = 1;
        if($bufptr > $modulestartptr) {$bufptr = $modulestartptr; $fok=0;}
      }

      if ($tempid == 0x002F) {
        # REFERENCECONTROL record:
        # -----------------------------------------------------
        # Id (2 bytes) = 0x002F
        # ...
      }

      if ($tempid == 0x0033) {
        # REFERENCEORIGINAL record:
        # -----------------------------------------------------
        # Id (2 bytes) = 0x0033
        # SizeOfLibidOriginal (4 bytes)
        # LibidOriginal (variable)
        my $ReferenceOriginalRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ReferenceOriginalRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ReferenceOriginalRecord_Libid = "";
        if ($ReferenceOriginalRecord_Size > 0) {
           $ReferenceOriginalRecord_Libid = substr($buf, $bufptr+6, $ReferenceOriginalRecord_Size);
        }
        $tempbufptr = $bufptr+6+$ReferenceOriginalRecord_Size; 
        printf("ReferenceOriginalRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
        push(@ProjectReferences, substr($buf, $bufptr, $tempbufptr-$bufptr));
        $bufptr = $tempbufptr;
        $fok = 1;
      }

      if ($tempid == 0x000D) {
        # REFERENCEREGISTERED record:
        # ------------------------------------------------------
        # Id (2 bytes) = 0x000D
        # Size (4 bytes)
        # SizeOfLibid (4 bytes)
        # Libid (variable)
        # Reserved1 (4 bytes) = 0x00000000
        # Reserved2 (2 bytes) = 0x0000
        my $ReferenceRegisteredRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ReferenceRegisteredRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ReferenceRegisteredRecord_Size2 = unpack('V', substr($buf, $bufptr+6, 4));
        printf("Size = %u, Size2 = %u\n", $ReferenceRegisteredRecord_Size, $ReferenceRegisteredRecord_Size2);
        my $ReferenceRegisteredRecord_Libid = "";
        if ($ReferenceRegisteredRecord_Size2 > 0) {
           $ReferenceRegisteredRecord_Libid = substr($buf, $bufptr+10, $ReferenceRegisteredRecord_Size2);
        }
        printf("Libid = %*v2.2x\n", ' ', $ReferenceRegisteredRecord_Libid);
        $tempbufptr = $bufptr+2+4+4+$ReferenceRegisteredRecord_Size2;
        my $ReferenceRegisteredRecord_Reserved1 = unpack('V', substr($buf, $tempbufptr, 4));
        my $ReferenceRegisteredRecord_Reserved2 = unpack('v', substr($buf, $tempbufptr+4, 2));
        $tempbufptr = $bufptr+2+4+$ReferenceRegisteredRecord_Size;
        #$tempbufptr = $bufptr+2+4+4+$ReferenceRegisteredRecord_Size2+4+2;
        printf("ReferenceRegisteredRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
        push(@ProjectReferences, substr($buf, $bufptr, $tempbufptr-$bufptr));
        $bufptr = $tempbufptr;
        $fok = 1;
      }

      if ($tempid == 0x000E) {
        # REFERENCEPROJECT record:
        # ------------------------------------------------------
        # Id (2 bytes) = 0x000E
        # Size (4 bytes)
        # SizeOfLibidAbsolute (4 bytes)
        # LibidAbsolute (variable)
        # SizeOfLibidRelative (4 bytes)
        # LibidRelative (variable)
        # MajorVersion (4 bytes)
        # MinorVersion (2 bytes)
        my $ReferenceProjectRecord_ismessedup = 0;
        my $ReferenceProjectRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ReferenceProjectRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        if ($ReferenceProjectRecord_Size > 0x00000FFF) {
           $ReferenceProjectRecord_ismessedup = 1;
           $ReferenceProjectRecord_Size = unpack('v', substr($buf, $bufptr+2, 2));
           if($ReferenceProjectRecord_Size == 0x0020) {
             $tempbufptr = $bufptr+2+2+$ReferenceProjectRecord_Size;
             if (unpack('v', substr($buf, $tempbufptr, 2)) == 0x0008) { $tempbufptr += 2; };
             if (unpack('v', substr($buf, $tempbufptr, 2)) == 0x0000) { $tempbufptr += 2; };
             printf("ReferenceProjectRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
             push(@ProjectReferences, substr($buf, $bufptr, $tempbufptr-$bufptr));
             $bufptr = $tempbufptr;
           }
        }
        if($ReferenceProjectRecord_ismessedup == 0) {
          my $ReferenceProjectRecord_SizeA = unpack('V', substr($buf, $bufptr+6, 4));
          my $ReferenceProjectRecord_LibidA = "";
          if ($ReferenceProjectRecord_SizeA > 0) {
             $ReferenceProjectRecord_LibidA = substr($buf, $bufptr+10, $ReferenceProjectRecord_SizeA);
          }
          $tempbufptr = $bufptr+10+$ReferenceProjectRecord_SizeA;
          my $ReferenceProjectRecord_SizeR = unpack('V', substr($buf, $tempbufptr, 4));
          my $ReferenceProjectRecord_LibidR = "";
          if ($ReferenceProjectRecord_SizeR > 0) {
             $ReferenceProjectRecord_LibidR = substr($buf, $tempbufptr+4, $ReferenceProjectRecord_SizeR);
          }
          $tempbufptr = $bufptr+10+$ReferenceProjectRecord_SizeA+4+$ReferenceProjectRecord_SizeR;
          my $ReferenceProjectRecord_Major = unpack('V', substr($buf, $tempbufptr, 4));
          my $ReferenceProjectRecord_Minor = unpack('v', substr($buf, $tempbufptr+4, 2));
          $tempbufptr = $bufptr+10+$ReferenceProjectRecord_SizeA+4+$ReferenceProjectRecord_SizeR+4+2;

          printf("ReferenceProjectRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, $tempbufptr-$bufptr));
          push(@ProjectReferences, substr($buf, $bufptr, $tempbufptr-$bufptr));
          $bufptr = $tempbufptr;
        }
        $fok = 1;
      }

      if($fok == 0) {
        $bufptr += 1;
        $fok = 1;
      }

      $tempid = unpack('v', substr($buf, $bufptr, 2));

      if(($bufptr >= length($buf)) or ($bufptr >= $modulestartptr)) { $fok = 0; }
    }


    print(">>> begin PROJECTMODULES decoding <<<\n");

    # PROJECTMODULES record:
    # ------------------------------------------------------
    # Id (2 bytes) = 0x000F
    # Size (4 bytes) = 0x00000002
    # Count (2 bytes) = number of modules
    # ProjectCookieRecord (8 bytes) = PROJECTCOOKIE
    # Modules (variable)            = MODULE
    my $ProjectModulesRecord_id = unpack('v', substr($buf, $bufptr, 2));
    my $ProjectModulesRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
    my $ProjectModulesRecord_Count = unpack('v', substr($buf, $bufptr+6, 2));
    my $ProjectModulesRecord_Cookie = substr($buf, $bufptr+8, 8);
    printf("Number of Modules = %u\n", $ProjectModulesRecord_Count);
    my $modulecount = $ProjectModulesRecord_Count;
    printf("ProjectModulesRecord = %*v2.2x\n", ' ', substr($buf, $bufptr, 8));

    # PROJECTCOOKIE record:
    # ------------------------------------------------------
    # Id (2 bytes) = 0x0013
    # Size (4 bytes) = 0x00000002
    # Cookie (2 bytes)
    my $ProjectCookieRecord_id = unpack('v', substr($ProjectModulesRecord_Cookie, 0, 2));
    my $ProjectCookieRecord_Size = unpack('V', substr($ProjectModulesRecord_Cookie, 2, 4));
    my $ProjectCookieRecord_Cookie = unpack('v', substr($ProjectModulesRecord_Cookie, 6, 2));
    printf("ProjectCookieRecord = %*v2.2x\n", ' ', substr($buf, $bufptr+8, 8));

    $bufptr += 16;

    # dump the rest of the buffer for debugging
    $tempbufptr = $bufptr;
    my $buf_len = length($buf);
    for ($tempbufptr = $bufptr; $tempbufptr <= $buf_len; $tempbufptr+=32) {
      printf("buf (%0.4u) = %*v2.2x\n", $tempbufptr, ' ', substr($buf, $tempbufptr, 32));
    }

    $tempid = unpack('v', substr($buf, $bufptr, 2));
    $fok = 1;
    my $isThisDocument = 0;
    while (($tempid != 0x002B) and ($tempid != 0x462B) and ($fok==1)) {
      printf("tempid = %0.4x\n", $tempid);
      $fok = 0;
      printf("debug (%u) = %*v2.2x\n", $bufptr, ' ', substr($buf, $bufptr, 48));

      # MODULE record:
      # ------------------------------------------------------
      # NameRecord (variable)         = MODULENAME
      # NameUnicodeRecord (variable)  = MODULENAMEUNICODE
      # StreamNameRecord (variable)   = MODULESTREAMNAME
      # DocStringRecord (variable)    = MODULEDOCSTRING
      # OffsetRecord (10 bytes)       = MODULEOFFSET
      # HelpContextRecord (10 bytes)  = MODULEHELPCONTEXT
      # CookieRecord (8 bytes)        = MODULECOOKIE
      # TypeRecord (6 bytes)          = MODULETYPE
      # ReadOnlyRecord (6 bytes)      = MODULEREADONLY (optional)
      # PrivateRecord (6 bytes)       = MODULEPRIVATE (optional)
      # Terminator (2 bytes) = 0x002B
      # Reserved (4 bytes) = 0x00000000

      if (($tempid == 0x0019) or ($tempid == 0x6C19)) {
        %ModuleRecord = (
                         'ModuleName_0019' => "",
                         'ModuleName_0047' => "",
                         'ModuleStreamName_001A' => "",
                         'ModuleOffset_0031' => 0
                        );  
        # MODULENAME record:
        # -------------------------------------------------------
        # Id (2 bytes) = 0x0019
        # SizeOfModuleName (4 bytes)
        # ModuleName (variable)
        my $ModuleNameRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleNameRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleNameRecord_Name = "";
        $tempbufptr = $bufptr+2+4;

        # scan ahead to find next entry = 0x0047
        $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        while (($tempbufptr <= $buf_len) and ($tempid != 0x0047)) {
          $ModuleNameRecord_Name .= substr($buf, $tempbufptr, 1);
          $tempbufptr++;
          $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        }
        $bufptr = $tempbufptr;
        $fok = 1;
        $ModuleRecord{'ModuleName_0019'} = $ModuleNameRecord_Name;
        printf("Module Name 0019 = %s\n", $ModuleNameRecord_Name);
        if($ModuleNameRecord_Name eq "ThisDocument") {$isThisDocument = 1; print "ThisDocument??\n";}
      }

      if ($tempid == 0x0047) {
        # MODULENAMEUNICODE record:
        # --------------------------------------------------------
        # Id (2 bytes) = 0x0047
        # SizeOfModuleNameUnicode (4 bytes)
        # ModuleNameUnicode (variable)
        my $ModuleNameUnicodeRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleNameUnicodeRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleNameUnicodeRecord_Name = "";
        $tempbufptr = $bufptr+2+4;

        # scan ahead to find next entry = 0x001A
        $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        while (($tempbufptr <= $buf_len) and ($tempid != 0x001A) and ($tempid != 0x6C1A)) {
          $ModuleNameUnicodeRecord_Name .= substr($buf, $tempbufptr, 1);
          $tempbufptr++;
          $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        }
        $bufptr = $tempbufptr; 
        $fok = 1;
        $ModuleRecord{'ModuleName_0047'} = $ModuleNameUnicodeRecord_Name;
      }

      if (($tempid == 0x001A) or ($tempid == 0x6C1A)) {
        # MODULESTREAMNAME record:
        # ---------------------------------------------------------
        # Id (2 bytes) = 0x001A
        # SizeOfStreamName (4 bytes)
        # StreamName (variable)
        # Reserved (2 bytes) 0x0032
        # SizeOfStreamNameUnicode (4 bytes)
        # StreamNameUnicode (variable)
        my $ModuleStreamNameRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleStreamNameRecord_Size1 = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleStreamNameRecord_Name1 = "";

        $tempbufptr = $bufptr+6;
        # scan ahead to find Reserved entry = 0x0032
        $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        while (($tempbufptr <= $buf_len) and ($tempid != 0x0032)) {
          $ModuleStreamNameRecord_Name1 .= substr($buf, $tempbufptr, 1);
          $tempbufptr++;
          $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        }
        my $ModuleStreamNameRecord_Reserved = unpack('v', substr($buf, $tempbufptr, 2));
        my $ModuleStreamNameRecord_Size2 = unpack('V', substr($buf, $tempbufptr+2, 4));
        $tempbufptr += 6;
        my $ModuleStreamNameRecord_Name2 = "";
        # scan ahead to find next entry = 0x001C
        $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        while (($tempbufptr <= $buf_len) and ($tempid != 0x001C) and ($tempid != 0x461C)) {
          $ModuleStreamNameRecord_Name2 .= substr($buf, $tempbufptr, 1);
          $tempbufptr++;
          $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        }
        $bufptr = $tempbufptr;
        $fok = 1;
        $ModuleRecord{'ModuleStreamName_001A'} = $ModuleStreamNameRecord_Name1;
        printf("Module Stream Name 001A = %s\n", $ModuleStreamNameRecord_Name1);
        if($ModuleStreamNameRecord_Name1 eq "ThisDocument") {$isThisDocument = 1; print "ThisDocument??\n";}
      }

      if (($tempid == 0x001C) or ($tempid == 0x461C)) {
        # MODULEDOCSTRING record:
        # ---------------------------------------------------------
        # Id (2 bytes) = 0x001C
        # SizeOfDocString (4 bytes)
        # DocString (variable)
        # Reserved (2 bytes) = 0x0048
        # SizeOfDocStringUnicode (4 bytes)
        # DocStringUnicode (variable)
        my $ModuleDocStringRecord = "";
        my $ModuleDocStringRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleDocStringRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleDocStringRecord_Doc1 = "";
        #my $ModuleDocStringRecord_Doc1 = substr($buf, $bufptr+6, $ModuleDocStringRecord_Size);
        #my $ModuleDocStringRecord_Reserved = unpack('v', substr($buf, $bufptr+6+$ModuleDocStringRecord_Size, 2));
        #$tempbufptr = $bufptr+6+$ModuleDocStringRecord_Size;
        #my $ModuleDocStringRecord_Size2 = unpack('V', substr($buf, $tempbufptr, 4));
        #my $ModuleDocStringRecord_Doc2 = "";
        #my $ModuleDocStringRecord_Doc2 = substr($buf, $tempbufptr+4, $ModuleDocStringRecord_Size2);
        $tempbufptr += 6;
        $ModuleDocStringRecord = substr($buf, $bufptr, 6);

        # scan ahead to find next entry = 0x0031
        $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        while (($tempbufptr <= $buf_len) and ($tempid != 0x0031)) {
          $ModuleDocStringRecord .= substr($buf, $tempbufptr, 1);
          $tempbufptr++;
          $tempid = unpack('v', substr($buf, $tempbufptr, 2)); 
        }

        $bufptr = $tempbufptr;
        $fok = 1;
      }

      if ($tempid == 0x0031) {
        # MODULEOFFSET record:
        # ----------------------------------------------------------
        # Id (2 bytes) = 0x0031
        # Size (4 bytes) = 0x00000004
        # TextOffset (4 bytes)
        my $ModuleOffsetRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleOffsetRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleOffsetRecord_Offset = unpack('V', substr($buf, $bufptr+6, 4));
        $bufptr += 10;
        $fok = 1;
        $ModuleRecord{'ModuleOffset_0031'} = $ModuleOffsetRecord_Offset;
        printf("Module Offset = %u\n", $ModuleOffsetRecord_Offset);
        if($isThisDocument == 1) { $ThisDocumentOffset = $ModuleOffsetRecord_Offset; $isThisDocument = 0; print ThisDocument??\n";}
      }

      if ($tempid == 0x001E) {
        # MODULEHELPCONTEXT record:
        # -----------------------------------------------------------
        # Id (2 bytes) = 0x001E
        # Size (4 bytes) - 0x00000004
        # HelpContext (4 bytes)
        my $ModuleHelpRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleHelpRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleHelpRecord_Context = unpack('V', substr($buf, $bufptr+6, 4));
        $bufptr += 10;
        $fok = 1;
      }

      if ($tempid == 0x002C) {
        # MODULECOOKIE record:
        # -----------------------------------------------------------
        # Id (2 bytes) = 0x002C
        # Size (4 bytes) = 0x00000002
        # Cookie (2 bytes)
        my $ModuleCookieRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleCookieRecord_Size = unpack('V', substr($buf, $bufptr+2, 4));
        my $ModuleCookieRecord_Cookie = unpack('v', substr($buf, $bufptr+6, 2));
        $bufptr += 8;
        $fok = 1;
      }

      if (($tempid == 0x0021) or ($tempid == 0x0022) or ($tempid == 0x4621) or ($tempid == 0x4622)) {
        # MODULETYPE record:
        # ------------------------------------------------------------
        # Id (2 bytes) = 0x0021 or 0x0022
        # Reserved (4 bytes) = 0x00000000
        my $ModuleTypeRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleTypeRecord_Reserved = unpack('V', substr($buf, $bufptr+2, 4));
        $bufptr += 6;
        $fok = 1;
      }

      if ($tempid == 0x0025) {
        # MODULEREADONLY record:
        # -------------------------------------------------------------
        # Id (2 bytes) = 0x0025
        # Reserved (4 bytes) = 0x00000000
        my $ModuleReadOnlyRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModuleReadOnlyRecord_Reserved = unpack('V', substr($buf, $bufptr+2, 4));
        $bufptr += 6;
        $fok = 1;
      }

      if ($tempid == 0x0028) {
        # MODULEPRIVATE record:
        # -------------------------------------------------------------
        # Id (2 bytes) = 0x0028
        # Reserved (4 bytes) = 0x00000000
        my $ModulePrivateRecord_id = unpack('v', substr($buf, $bufptr, 2));
        my $ModulePrivateRecord_Reserved = unpack('V', substr($buf, $bufptr+2, 4));
        $bufptr += 6;
        $fok = 1;
      }

      if($fok == 0) {
        $bufptr += 1;
        $fok = 1;
      }

      $tempid = unpack('v', substr($buf, $bufptr, 2));
      if(($tempid == 0x002B) or ($tempid == 0x462B)) {
        my $ModuleReserved = unpack('V', substr($buf, $bufptr+2, 4));
        $bufptr += 6;
        if($bufptr >= length($buf)) {$fok = 0;}

        # add an entry to @ModuleRecords
        push(@ModuleRecords, %ModuleRecord);

        print Dumper(%ModuleRecord) . "\n";
        printf("Module: %s : %u\n", $ModuleRecord{'ModuleStreamName_001A'}, $ModuleRecord{'ModuleOffset_0031'} );
        $ModuleOffsets{ $ModuleRecord{'ModuleStreamName_001A'} } = $ModuleRecord{'ModuleOffset_0031'};
        $tempid = unpack('v', substr($buf, $bufptr, 2));
      }
      if($bufptr >= length($buf)-1) { $fok = 0; }
    }

    print "Modules\n";
    print "=======\n";
    print Dumper(@ModuleRecords) . "\n";
}


sub parse_module_stream ($$$) {
    my $stream = shift;
    my $offset = shift;
    my $dump = shift;

    my $bufptr = 0;
    my $buf_len = length($stream);

    my $sourcecode = "";

    if($dump == 1) {
      printf("len = %u, offset = %u\n", $buf_len, $offset);
      # dump the buffer for debugging
      my $tempbufptr = 0;
      for ($tempbufptr = 0; $tempbufptr <= $buf_len; $tempbufptr+=32) {
        printf("stream (%0.4u) = %*v2.2x\n", $tempbufptr, ' ', substr($stream, $tempbufptr, 32));
      }
    }

    # Top level of a module stream
    # -------------------------------------------------------------
    # PerformanceCache (variable)
    # CompressedSourceCode (variable)

    # PerformanceCache is MODULEOFFSET bytes in size

    $sourcecode = substr($stream, $offset+1, $buf_len-$offset);

    return $sourcecode;
}

dump-ole-storage-recursive.pl
There is quite a bit of debugging code still in this script but it serves the learning process for me.
This script will extract and display the macro text in a Word file.
It is pure perl, no external dependencies (i.e. clam-av).

Now to take what I've learned with deconstructing Word DOC files and convert it into a procedure I can use via mimedefang.
The goal is to reject Word DOC files that have macros, reject Word DOC files that are not really DOC files but are renamed DOCX, DOCM, RTF, HTML, etc.
After that it would be nice to look into the Word DOC content to find if any remote content is needed. For example, a frame that downloads an RTF file that in turn downloads a malicious program. While this activity should be caught by the desktop's anti-virus system, the original file is still considered to be clean. With the rapid changes, anti-virus vendors can not keep up so it is better to have multiple methods and layers. While I may block some legitimate content, those parties have an alternative delivery mechanism available. I'll post my mimedefang script when I get it working.

Bookmark the permalink.

Comments are closed.