Subroutines
The F8 has no internal program counter stack, so you must be careful when calling subroutines. Using PI/POP only works for one level of subroutines, because the return address for the first PI opcode will be overwritten by subsequent PI opcodes. Here's a single-level example:
  prog:
      ; ...do something...
      pi sub
      ; ...do more...
  sub:
      ; ...do something...
      pop
To have 2 levels of subroutines, you can use the K register to save the first return address:
  prog:
      ; ...do something...
      pi sub1
      ; ...do more...
  sub1:
      lr k,p
      ; ...do something...
      pi sub2
      ; ...do more...
      pk
  sub2:
      ; ...do something...
      pop
That's as deep as the processor allows you to go without writing additional code to save return addresses. In the Channel F BIOS, there are routines which create a simulated stack for the K register. The routine at $0107 (known as PUSHK or CALL) can push K to the stack and the routine at $011E (known as POPK or RTRN) can pop K from the stack. For example:
  prog:
      ; ...do something...
      pi sub1
      ; ...do more...
  sub1:
      lr k,p
      pi PUSHK
      ; ...do something...
      pi sub2
      ; ...do more...
      pi POPK
      pk
  sub2:
      lr k,p
      pi PUSHK
      ; ...do something...
      pi sub3
      ; ...do more...
      pi POPK
      pk
By using PUSHK/POPK, you can have more than 2 levels of subroutine calls. However, a lot of overhead is added to the code by manipulating the stack. Whenever calling a subroutine one level deep, it's best to use the PI/POP combination; for two subroutines, it's best to use the second example above.
Also consider using macros- you have a lot more program space than the original Channel F programmers, so you might as well use it; the time you save can be considerable.
Blackbird is writing more efficient versions of PUSHK/POPK (Snippet:KStack). Another idea is to write a version that uses the Schach RAM at $2800 that MESS emulates. That would free up more scratchpad registers.
Here's a trick from the Guide: if a subroutine will be called frequently, it's quicker to load its address into the K register and call it using PK than to use PI multiple times.
