6502 instruction reference
Last updated:
The NES CPU is a Ricoh 2A03 — a 6502 with no decimal mode. Each cycle below is one memory access on the bus; the CPU never sits idle, so even "do nothing" steps show up as discarded reads.
CPU flow
The CPU is built to execute instructions sequentially, one by one, by reading them from memory. An instruction is 1, 2, or 3 bytes long, and the first byte is always the opcode ("operation code") — a one-byte number that identifies the instruction.
Addresses on this CPU are 16 bits long (2 bytes) — that's a 64 KiB
address space, $0000 to $FFFF.
The CPU tracks where it's reading from with a 16-bit register
called PC (program counter): the address of the next byte to read.
The basic operation read [PC]; PC++ is called a fetch. Executing an instruction is
a sequence of fetches — first the opcode, then the remaining bytes
(if any).
Cycle 1 of every instruction is therefore the same: fetch the opcode to identify the instruction.
Example
Say PC = $8000 and memory contains:
$8000 C9
$8001 42
$8002 ...
The CPU is about to start a new instruction. It fetches the opcode:
cycle 1 read [$8000] → C9 (opcode); PC = $8001
C9 is the opcode for an instruction that does something specific
with one operand byte (what exactly isn't important here). The CPU
fetches the operand:
cycle 2 read [$8001] → 42 (operand); PC = $8002
PC sits at $8002, ready for the next opcode fetch.
Instructions
Since the opcode is one byte, there are 256 possible instructions — one per value from 0x00 to 0xFF. Each opcode is its own instruction with its own fixed behavior.
Several instructions will also read the following one or two bytes called operands, which are used within that instruction.
Almost every instruction is a (mnemonic, addressing mode) pair: a
mnemonic that says what to do and a mode that says where the
operand lives. The opcode byte just encodes that pair. LDA $42
(opcode A5) and LDA #$42 (opcode A9) are two different instructions
at the silicon level — different cycles, different bus pattern — but
they share an idea, which is why the assembler lets you write both as
"LDA".
There's a small group of instructions that don't fit this. Their cycle sequence doesn't decompose into mnemonic + mode — they're bespoke. I call these special*:
JSR,RTS(subroutine call/return).BRK,RTI(interrupt entry/exit).PHA,PHP,PLA,PLP(stack push/pull).- The interrupt service sequence itself (not an instruction, but the CPU runs a fixed cycle pattern on NMI/IRQ/RESET).
For everything else, knowing the mnemonic and the mode tells you the cycles, the bus accesses, and the order — only the value computed changes from one mnemonic to the next.
Registers
Programmer-visible (software can read/write these directly):
A (accumulator, 8 bits) — main scratch register; target of most ALU operations.
X, Y (8 bits each) — index registers; used by indexed addressing modes and as loop counters.
PC (program counter, 16 bits) — address of the next byte to read. Split into PCL (low 8 bits) and PCH (high 8 bits); the cycle breakdowns refer to them separately when the CPU updates one half without the other.
SP (stack pointer, 8 bits) — low byte of the stack address. The stack lives in page 1, so the full address is
01:SP. Push decrements SP, pull increments it.P (status register, 8 bits) — the flags:
bit 7 6 5 4 3 2 1 0 N V - B D I Z CN (negative), V (overflow), B (break — only meaningful in the stacked copy pushed by PHP/BRK), D (decimal — ignored by the 2A03), I (interrupt disable), Z (zero), C (carry).
IR (instruction register, 8 bits) — holds the opcode of the instruction currently executing. Loaded at cycle 1.
AD (effective address latch, 16 bits) — where the current instruction's operand address is assembled during addressing. Split into ADL (low) and ADH (high). The
loandhiyou'll see in the mode breakdowns are ADL and ADH being written and later read as[ADH:ADL].DL (data latch, 8 bits) — holds the most recent byte read from the bus; used to carry values between cycles.
The following latches exist in some form in the real hardware but are oversimplified here — but enough to achieve accurate emulation:
- CL (carry latch, 1 bit) — remembers the page-crossing carry bit between the cycle that adds an index to the low address byte and the cycle that does the fixed-address re-read.
- NEGATIVE (sign latch, 1 bit) — remembers the sign of a value across cycles, mainly for relative branches whose offset is negative.
- IAL (intermediate ALU latch, 8 bits) — parks an ALU result between cycles so the next bus transaction can consume it.
Addressing modes
Notation: [X] means "value at address X". Addresses are written
high:low when assembled across cycles — typically ADH:ADL for the
operand address being built, or PCH:PCL for the program counter.
Cycle 1 (opcode fetch) is omitted from every mode below.
Implied (Imp) — 2 cycles
2 read [PC] (discarded); do the op
Accumulator (A) — 2 cycles
2 read [PC] (discarded); op on A
Immediate (Imm) — 2 cycles
2 read [PC] → DL, PC++ (DL = operand)
Zero Page (ZP) — 3 cycles
2 read [PC] → ADL, PC++; ADH = 0
3 read [ADH:ADL] → DL
Zero Page,X (ZPX) — 4 cycles
2 read [PC] → ADL, PC++; ADH = 0
3 read [ADH:ADL] (discarded); ADL = ADL + X (wraps in ZP)
4 read [ADH:ADL] → DL
ZPY is the same with Y.
Absolute (Abs) — 4 cycles
2 read [PC] → ADL, PC++
3 read [PC] → ADH, PC++
4 read [ADH:ADL] → DL
Absolute,X (AbsX) — 4 or 5 cycles
2 read [PC] → ADL, PC++
3 read [PC] → ADH, PC++; (ADL, CL) = ADL + X
4 read [ADH:ADL] → DL
if CL == 0: done (4)
5 ADH = ADH + 1; read [ADH:ADL] → DL (only if CL == 1)
AbsY is the same with Y.
(Indirect,X) (IzX) — 6 cycles
2 read [PC] → ADL, PC++; ADH = 0 (ADL = ZP base address)
3 read [ADH:ADL] (discarded); ADL = ADL + X (wraps in ZP)
4 read [ADH:ADL] → IAL (pointer low, parked)
5 ADL = ADL + 1 (wraps); read [ADH:ADL] → ADH; ADL = IAL
6 read [ADH:ADL] → DL
(Indirect),Y (IzY) — 5 or 6 cycles
2 read [PC] → ADL, PC++; ADH = 0 (ADL = ZP address)
3 read [ADH:ADL] → IAL (pointer low, parked)
4 ADL = ADL + 1 (wraps); read [ADH:ADL] → ADH; ADL = IAL
5 (ADL, CL) = ADL + Y
read [ADH:ADL] → DL
if CL == 0: done (5)
6 ADH = ADH + 1; read [ADH:ADL] → DL (only if CL == 1)
Relative (Rel) — 2 / 3 / 4 cycles
2 read [PC] → DL, PC++ (DL = signed offset)
if branch not taken: done (2)
3 NEGATIVE = DL bit 7
(PCL, CL) = PCL + DL (treat DL as unsigned for the add)
read [PCH:PCL] (discarded)
if NEGATIVE == CL: PC final, done (3)
4 PCH = PCH + (NEGATIVE ? -1 : +1)
read [PCH:PCL] (discarded)
Indirect (Ind) — 5 cycles, JMP only
2 read [PC] → IAL, PC++ (pointer low, parked)
3 read [PC] → ADH, PC++ (pointer high → ADH)
4 read [ADH:IAL] → PCL (target low byte)
5 IAL = IAL + 1 (no carry into ADH — bug)
read [ADH:IAL] → PCH (target high byte)
JMP ($02FF) reads the high byte from $0200, not $0300. The
increment of IAL at cycle 5 doesn't carry into ADH — classic 6502
bug, not fixed in the 2A03.
LDA — load accumulator
A := M. Flags: N - - - - - Z -.
| Mode | Op |
|---|---|
| Imm | A9 |
| ZP | A5 |
| ZPX | B5 |
| Abs | AD |
| AbsX | BD |
| AbsY | B9 |
| IzX | A1 |
| IzY | B1 |
STA — store accumulator
M := A. Flags: unaffected. No immediate mode.
| Mode | Op |
|---|---|
| ZP | 85 |
| ZPX | 95 |
| Abs | 8D |
| AbsX | 9D |
| AbsY | 99 |
| IzX | 81 |
| IzY | 91 |
BIT — bit test
Z := (A AND M) == 0, N := M[7], V := M[6]. A is not modified.
| Mode | Op |
|---|---|
| ZP | 24 |
| Abs | 2C |
ADC — add with carry
A := A + M + C. Flags: N V - - - - Z C. V is set when the signed
result doesn't fit (i.e., the result's sign disagrees with the
operands').
| Mode | Op |
|---|---|
| Imm | 69 |
| ZP | 65 |
| ZPX | 75 |
| Abs | 6D |
| AbsX | 7D |
| AbsY | 79 |
| IzX | 61 |
| IzY | 71 |
INX — increment X
X := X + 1. Flags: N - - - - - Z -. Implied only.
| Mode | Op |
|---|---|
| Imp | E8 |
JSR — jump to subroutine
Single opcode 20. 6 cycles.
2 read [PC] → ADL, PC++ (target low parked in ADL)
3 read [01:SP] (internal operation)
4 write [01:SP] ← PCH; SP--
5 write [01:SP] ← PCL; SP--
6 read [PC] → PCH; PCL = ADL
The address pushed is the PC of JSR's high-operand byte, not the next
instruction. RTS pops it and increments by 1, so the round trip
lands on the byte right after the JSR.
RTS — return from subroutine
Single opcode 60. 6 cycles.
2 read [PC] (discarded)
3 read [01:SP] (discarded)
4 SP++; read [01:SP] → PCL
5 SP++; read [01:SP] → PCH
6 PC = PCH:PCL; PC++
BNE — branch if not equal (Z == 0)
Relative mode. Cycle cost per the Rel breakdown above.
| Mode | Op |
|---|---|
| Rel | D0 |
JMP — unconditional jump
| Mode | Op |
|---|---|
| Abs | 4C |
| Ind | 6C |
Notes: indirect JMP has the page-boundary bug detailed in the Ind mode section.
What's not on this page yet
The remaining ~45 instructions follow the same shape. The two
interesting outliers are the illegal opcodes (undocumented but real
operations like LAX, SAX) and the interrupt instructions
(BRK, RTI, plus the reset/IRQ/NMI vector machinery), which are
state-machine topics rather than pure instruction encodings.