Source: CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm

Encoding x86 Instructions

1. x86 Instructions Overview

x86 Instruction Encoding:

x86 Instruction Encoding

Although the diagram seems to imply that instructions can be up to 16 bytes long, in actuality the x86 will not allow instructions greater than 15 bytes in length.

The prefix bytes are not the opcode expansion prefix discussed earlier - they are special bytes to modify the behavior of existing instructions.

2. x86 Instruction Format Reference

Another view of the x86 instruction format:

The x86 instruction format

Additional reference:

3. x86 Opcode Sizes

The x86 CPU supports two basic opcode sizes:
  1. standard one-byte opcode
  2. two-byte opcode consisting of a 0Fh opcode expansion prefix byte.
    The second byte then specifies the actual instruction.

3.1. x86 ADD Instruction Opcode

x86 ADD instruction opcode:

x86 ADD Opcode

Bit number one, marked d, specifies the direction of the data transfer:

Bit number zero marked s specifies the size of the operands the ADD instruction operates upon:
You'll soon see that this direction bit d creates a problem that results in one instruction have two different possible opcodes.

4. Encoding x86 Instruction Operands, MOD-REG-R/M Byte

The MOD-REG-R/M byte specifies instruction operands and their addressing mode(*):
MOD-REG-R/M Byte

The MOD field specifies x86 addressing mode:

MODMeaning
00Register indirect addressing mode or SIB with no displacement (when R/M = 100) or Displacement only addressing mode (when R/M = 101).
01One-byte signed displacement follows addressing mode byte(s).
10Four-byte signed displacement follows addressing mode byte(s).
11Register addressing mode.

The REG field specifies source or destination register:

REG Value Register if data size is eight bits Register if data size is 16-bits Register if data size is 32 bits
000alaxeax
001clcxecx
010dldxedx
011blbxebx
100ahspesp
101chbpebp
110dhsiesi
111bhdiedi

The R/M field, combined with MOD, specifies either

  1. the second operand in a two-operand instruction, or
  2. the only operand in a single-operand instruction like NOT or NEG.
The d bit in the opcode determines which operand is the source, and which is the destination:
(*) Technically, registers do not have an address, but we apply the term addressing mode to registers nonetheless.

5. General-Purpose Registers

16-bit general-purpose registers

Since the processor accesses registers more quickly than it accesses memory, you can make your programs run faster by keeping the most-frequently used data in registers.

6. REG Field of the MOD-REG-R/M Byte

See MOD-REG-R/M Byte.

Depending on the instruction, this can be either the source or the destination operand.

Many instructions have the d (direction) field in their opcode to choose REG operand role:

  1. If d=0, REG is the source,
    MOD R/M <- REG.
  2. If d=1, REG is the destination,
    REG <- MOD R/M.

(*) For certain (often single-operand or immediate-operand) instructions, the REG field may contain an opcode extension rather than the register bits. The R/M field will specify the operand in such case.

9. MOD R/M Byte and Addressing Modes

MOD R/M Addressing Mode
=== === ================================
 00 000 [ eax ]
 01 000 [ eax + disp8 ]               (1)
 10 000 [ eax + disp32 ]
 11 000 register  ( al / ax / eax )   (2)
 00 001 [ ecx ]
 01 001 [ ecx + disp8 ]
 10 001 [ ecx + disp32 ]
 11 001 register  ( cl / cx / ecx )
 00 010 [ edx ]
 01 010 [ edx + disp8 ]
 10 010 [ edx + disp32 ]
 11 010 register  ( dl / dx / edx )
 00 011 [ ebx ]
 01 011 [ ebx + disp8 ]
 10 011 [ ebx + disp32 ]
 11 011 register  ( bl / bx / ebx )
 00 100 SIB  Mode                     (3)
 01 100 SIB  +  disp8  Mode
 10 100 SIB  +  disp32  Mode
 11 100 register  ( ah / sp / esp )
 00 101 32-bit Displacement-Only Mode (4)
 01 101 [ ebp + disp8 ]
 10 101 [ ebp + disp32 ]
 11 101 register  ( ch / bp / ebp )
 00 110 [ esi ]
 01 110 [ esi + disp8 ]
 10 110 [ esi + disp32 ]
 11 110 register  ( dh / si / esi )
 00 111 [ edi ]
 01 111 [ edi + disp8 ]
 10 111 [ edi + disp32 ]
 11 111 register  ( bh / di / edi )
  1. Addressing modes with 8-bit displacement fall in the range -128..+127 and require only a single byte displacement after the opcode (Faster!)
  2. The size bit in the opcode specifies 8 or 32-bit register size. To select a 16-bit register requires a prefix byte.
  3. The so-called scaled indexed addressing modes, SIB = scaled index byte mode.
  4. Note that there is no [ ebp ] addressing. It's slot is occupied by the 32-bit displacement only addressing mode. Intel decided that programmers can use [ ebp+ disp8 ] addressing mode instead, with its 8-bit displacement set equal to zero (instruction is a little longer, though.)

8. SIB (Scaled Index Byte) Layout

Scaled index byte layout:

SIB, Scaled index byte layout

Scale ValueIndex*Scale Value
00Index*1
01Index*2
10Index*4
11Index*8
IndexRegister
000EAX
001ECX
010EDX
011EBX
100Illegal
101EBP
110ESI
111EDI
BaseMODRegister
000xxEAX
001xxECX
010xxEDX
011xxEBX
100xxESP
10100Displacement-only
01, 10EBP
110xxESI
111xxEDI

8.1. Scaled Indexed Addressing Mode

[ reg32 + eax*n ] MOD = 00
[ reg32 + ebx*n ] 
[ reg32 + ecx*n ]
[ reg32 + edx*n ]
[ reg32 + ebp*n ]
[ reg32 + esi*n ]
[ reg32 + edi*n ]

[ disp + reg8 + eax*n ] MOD = 01
[ disp + reg8 + ebx*n ]
[ disp + reg8 + ecx*n ]
[ disp + reg8 + edx*n ]
[ disp + reg8 + ebp*n ]
[ disp + reg8 + esi*n ]
[ disp + reg8 + edi*n ]

[ disp + reg32 + eax*n ] MOD = 10
[ disp + reg32 + ebx*n ]
[ disp + reg32 + ecx*n ]
[ disp + reg32 + edx*n ]
[ disp + reg32 + ebp*n ]
[ disp + reg32 + esi*n ]
[ disp + reg32 + edi*n ]

[ disp + eax*n ] MOD = 00, and
[ disp + ebx*n ] BASE field = 101
[ disp + ecx*n ]
[ disp + edx*n ]
[ disp + ebp*n ]
[ disp + esi*n ]
[ disp + edi*n ]

Note: n = 1, 2, 4, or 8.

In each scaled indexed addressing mode the MOD field in MOD-REG-R/M byte specifies the size of the displacement. It can be zero, one, or four bytes:

    MOD R/M  Addressing Mode
    --- ---  --------------------------- 
     00 100  SIB
     01 100  SIB + disp8
     10 100  SIB + disp32
The Base and Index fields of the SIB byte select the base and index registers, respectively.

Note that this addressing mode does not allow the use of the ESP register as an index register. Presumably, Intel left this particular mode undefined to provide the ability to extend the addressing modes in a future version of the CPU.

9. Examples

9.1. Encoding ADD Instruction Example

9.2 Encoding ADD CL, AL Instruction

9.3. Encoding ADD ECX, EAX Instruction

9.4. Encoding ADD EDX, DISPLACEMENT Instruction

9.5. Encoding ADD EDI, [EBX] Instruction

9.6. Encoding ADD EAX, [ ESI + disp8 ] Instruction

9.7. Encoding ADD EBX, [ EBP + disp32 ] Instruction

9.8. Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction

9.9. Encoding ADD ECX, [ EBX + EDI*4 ] Instruction

10. Encoding ADD Immediate Instruction

Encoding x86 immediate operands:

Encoding Immediate Operands

MOD-REG-R/M and SIB bytes have no bit combinations to specify an immediate operand.

Instead, x86 uses a entirely different instruction format to specify instruction with an immediate operand.

There are three rules that apply:

  1. If opcode high-order bit set to 1, then instruction has an immediate constant.
  2. There is no direction bit in the opcode:
  3. The third difference between the ADD-immediate and the standard ADD instruction is the meaning of the REG field in the MOD-REG-R/M byte:
Note that when adding a constant to a memory location, the displacement (if any) immediately precedes the immediate (constant) value in the opcode sequence.

11. Encoding Eight, Sixteen, and Thirty-Two Bit Operands

x86 ADD Opcode:

x86 ADD Opcode

11.1. Encoding Sixteen Bit Operands

x86 instruction format:

instruction format

32-bit programs don't use 16-bit operands that often, but they do need them now and then.

To allow for 16-bit operands, Intel added prefix a 32-bit mode instruction with the operand size prefix byte with value 66h.

This prefix byte tells the CPU to operand on 16-bit data rather than 32-bit data.

12. x86 Instruction Prefix Bytes

13. Alternate Encodings for Instructions

14. x86 Opcode Summary

14.1. MOD-REG-R/M Byte Summary

MOD-REG-R/M Byte:

MOD-REG-R/M Byte

15. ISA Design Considerations

15.1. ISA Design Challenges

16. Intel Architecture Software Developer's Manual

Classic Intel Pentium II Architecture Software Developer's Manual contains three parts:
  1. Volume 1 , Intel Basic Architecture: Order Number 243190 , PDF, 2.6 MB.
  2. Volume 2 , Instruction Set Reference: Order Number 243191 , PDF, 6.6 MB.
  3. Volume 3 , System Programing Guide: Order Number 243192 , PDF, 5.1 MB.
It is highly recommended that you download the above manuals and use them as a reference.

16.1. Intel Instruction Set Reference (Volume2)

16.2. Chapter 3 of Intel Instruction Set Reference

16.3. Intel Reference Opcode Bytes

16.4. Intel Reference Opcode Bytes, Cont.

16.5. Intel Reference Opcode Bytes, Cont.

16.6. Intel Reference Opcode Bytes, Cont.

16.7. Intel Reference Opcode Bytes, Cont.

16.8. Intel Reference Opcode Bytes, Cont.

16.9. Intel Reference Instruction Column