6502 instruction reference

Last updated:

The NES CPU is a Ricoh 2A03 — a 6502 with no decimal mode. Each cycle below is one memory access on the bus; the CPU never sits idle, so even "do nothing" steps show up as discarded reads.

CPU flow

The CPU is built to execute instructions sequentially, one by one, by reading them from memory. An instruction is 1, 2, or 3 bytes long, and the first byte is always the opcode ("operation code") — a one-byte number that identifies the instruction.

Addresses on this CPU are 16 bits long (2 bytes) — that's a 64 KiB address space, $0000 to $FFFF.

The CPU tracks where it's reading from with a 16-bit register called PC (program counter): the address of the next byte to read. The basic operation read [PC]; PC++ is called a fetch. Executing an instruction is a sequence of fetches — first the opcode, then the remaining bytes (if any).

Cycle 1 of every instruction is therefore the same: fetch the opcode to identify the instruction.

Example

Say PC = $8000 and memory contains:

$8000   C9
$8001   42
$8002   ...

The CPU is about to start a new instruction. It fetches the opcode:

cycle 1   read [$8000] → C9       (opcode); PC = $8001

C9 is the opcode for an instruction that does something specific with one operand byte (what exactly isn't important here). The CPU fetches the operand:

cycle 2   read [$8001] → 42       (operand); PC = $8002

PC sits at $8002, ready for the next opcode fetch.

Instructions

Since the opcode is one byte, there are 256 possible instructions — one per value from 0x00 to 0xFF. Each opcode is its own instruction with its own fixed behavior.

Several instructions will also read the following one or two bytes called operands, which are used within that instruction.

Almost every instruction is a (mnemonic, addressing mode) pair: a mnemonic that says what to do and a mode that says where the operand lives. The opcode byte just encodes that pair. LDA $42 (opcode A5) and LDA #$42 (opcode A9) are two different instructions at the silicon level — different cycles, different bus pattern — but they share an idea, which is why the assembler lets you write both as "LDA".

There's a small group of instructions that don't fit this. Their cycle sequence doesn't decompose into mnemonic + mode — they're bespoke. I call these special*:

For everything else, knowing the mnemonic and the mode tells you the cycles, the bus accesses, and the order — only the value computed changes from one mnemonic to the next.

Registers

Programmer-visible (software can read/write these directly):

The following latches exist in some form in the real hardware but are oversimplified here — but enough to achieve accurate emulation:

Addressing modes

Notation: [X] means "value at address X". Addresses are written high:low when assembled across cycles — typically ADH:ADL for the operand address being built, or PCH:PCL for the program counter. Cycle 1 (opcode fetch) is omitted from every mode below.

Implied (Imp) — 2 cycles

 2   read [PC] (discarded); do the op

Accumulator (A) — 2 cycles

 2   read [PC] (discarded); op on A

Immediate (Imm) — 2 cycles

 2   read [PC] → DL, PC++       (DL = operand)

Zero Page (ZP) — 3 cycles

 2   read [PC] → ADL, PC++; ADH = 0
 3   read [ADH:ADL] → DL

Zero Page,X (ZPX) — 4 cycles

 2   read [PC] → ADL, PC++; ADH = 0
 3   read [ADH:ADL] (discarded); ADL = ADL + X       (wraps in ZP)
 4   read [ADH:ADL] → DL

ZPY is the same with Y.

Absolute (Abs) — 4 cycles

 2   read [PC] → ADL, PC++
 3   read [PC] → ADH, PC++
 4   read [ADH:ADL] → DL

Absolute,X (AbsX) — 4 or 5 cycles

 2   read [PC] → ADL, PC++
 3   read [PC] → ADH, PC++; (ADL, CL) = ADL + X
 4   read [ADH:ADL] → DL
     if CL == 0: done (4)
 5   ADH = ADH + 1; read [ADH:ADL] → DL     (only if CL == 1)

AbsY is the same with Y.

(Indirect,X) (IzX) — 6 cycles

 2   read [PC] → ADL, PC++; ADH = 0          (ADL = ZP base address)
 3   read [ADH:ADL] (discarded); ADL = ADL + X       (wraps in ZP)
 4   read [ADH:ADL] → IAL                    (pointer low, parked)
 5   ADL = ADL + 1 (wraps); read [ADH:ADL] → ADH; ADL = IAL
 6   read [ADH:ADL] → DL

(Indirect),Y (IzY) — 5 or 6 cycles

 2   read [PC] → ADL, PC++; ADH = 0          (ADL = ZP address)
 3   read [ADH:ADL] → IAL                    (pointer low, parked)
 4   ADL = ADL + 1 (wraps); read [ADH:ADL] → ADH; ADL = IAL
 5   (ADL, CL) = ADL + Y
     read [ADH:ADL] → DL
     if CL == 0: done (5)
 6   ADH = ADH + 1; read [ADH:ADL] → DL     (only if CL == 1)

Relative (Rel) — 2 / 3 / 4 cycles

 2   read [PC] → DL, PC++        (DL = signed offset)
     if branch not taken: done (2)
 3   NEGATIVE = DL bit 7
     (PCL, CL) = PCL + DL        (treat DL as unsigned for the add)
     read [PCH:PCL] (discarded)
     if NEGATIVE == CL: PC final, done (3)
 4   PCH = PCH + (NEGATIVE ? -1 : +1)
     read [PCH:PCL] (discarded)

Indirect (Ind) — 5 cycles, JMP only

 2   read [PC] → IAL, PC++       (pointer low, parked)
 3   read [PC] → ADH, PC++       (pointer high → ADH)
 4   read [ADH:IAL] → PCL        (target low byte)
 5   IAL = IAL + 1               (no carry into ADH — bug)
     read [ADH:IAL] → PCH        (target high byte)

JMP ($02FF) reads the high byte from $0200, not $0300. The increment of IAL at cycle 5 doesn't carry into ADH — classic 6502 bug, not fixed in the 2A03.

LDA — load accumulator

A := M. Flags: N - - - - - Z -.

Mode Op
Imm A9
ZP A5
ZPX B5
Abs AD
AbsX BD
AbsY B9
IzX A1
IzY B1

STA — store accumulator

M := A. Flags: unaffected. No immediate mode.

Mode Op
ZP 85
ZPX 95
Abs 8D
AbsX 9D
AbsY 99
IzX 81
IzY 91

BIT — bit test

Z := (A AND M) == 0, N := M[7], V := M[6]. A is not modified.

Mode Op
ZP 24
Abs 2C

ADC — add with carry

A := A + M + C. Flags: N V - - - - Z C. V is set when the signed result doesn't fit (i.e., the result's sign disagrees with the operands').

Mode Op
Imm 69
ZP 65
ZPX 75
Abs 6D
AbsX 7D
AbsY 79
IzX 61
IzY 71

INX — increment X

X := X + 1. Flags: N - - - - - Z -. Implied only.

Mode Op
Imp E8

JSR — jump to subroutine

Single opcode 20. 6 cycles.

 2   read [PC] → ADL, PC++           (target low parked in ADL)
 3   read [01:SP] (internal operation)
 4   write [01:SP] ← PCH; SP--
 5   write [01:SP] ← PCL; SP--
 6   read [PC] → PCH; PCL = ADL

The address pushed is the PC of JSR's high-operand byte, not the next instruction. RTS pops it and increments by 1, so the round trip lands on the byte right after the JSR.

RTS — return from subroutine

Single opcode 60. 6 cycles.

 2   read [PC] (discarded)
 3   read [01:SP] (discarded)
 4   SP++; read [01:SP] → PCL
 5   SP++; read [01:SP] → PCH
 6   PC = PCH:PCL; PC++

BNE — branch if not equal (Z == 0)

Relative mode. Cycle cost per the Rel breakdown above.

Mode Op
Rel D0

JMP — unconditional jump

Mode Op
Abs 4C
Ind 6C

Notes: indirect JMP has the page-boundary bug detailed in the Ind mode section.

What's not on this page yet

The remaining ~45 instructions follow the same shape. The two interesting outliers are the illegal opcodes (undocumented but real operations like LAX, SAX) and the interrupt instructions (BRK, RTI, plus the reset/IRQ/NMI vector machinery), which are state-machine topics rather than pure instruction encodings.