Programming tutorial: Part 18–Files

Atari Lynx programming tutorial series:

In part 16 we investigated the way cartridges work at a reasonably low level and how the CC65 libraries help out by providing cartridge read functions from a start position and onwards. In this part we will take this one step further and introduce the file system that Epyx introduced and is still used in most cartridges. Also, we will see how such files are created and put into the ROM image. That will bring us right back to segments and memory from part 15.

All kinds of images

So far we haven’t really looked at how ROM images are organized. There are quite a few types of images available. Most of these images are related to the development kit that was used to create them.

Development kit File extension Description
Epyx .bin Uploadable file for Pinky or Howard
Epyx .rom ROM image with unencrypted header
BLL .o (or .obj) Uploadable file, not to be confused with the CC65 object files (also .o extension)
CC65 .lyx Headerless ROM image
CC65 .lnx .lyx file with Handy header

Two of these image types are more alike than you might expect. Even coming from different development kits they use the same directory and files structures. The Epyx .rom, the .lyx images (and therefore also the CC65 .lnx images) all have a directory that lists one or more files for the game to load from cart into RAM memory. The Epyx development kit introduced this as a (partially optional) means of partitioning content on a cartridge. You could choose not to use it, but you must have at least one file entry to make a bootable game on real Lynx hardware with the original encrypted headers. If you create your own boot loader in the encrypted header of a cartridge you could dispose file entries and the directory all together. As it turned out, the mechanism of having a directory structure proved to be really useful. Useful and relatively easy, as most segmented games have used it ever since. The functions to load files from the cartridge have been put to the test extensively and work like a charm.

Directory and files

The contents or binary image of a cartridge typically have at least three distinct areas:

  1. Encrypted header
  2. Directory structure
  3. Individual files containing code or data

Each of these areas consist of one or more items. Here’s a graphical view of a cartridge that you could find in an arbitrary, generic retail cartridge for the Lynx.

image

The three areas are marked in green, orange and grey. The header with its encrypted two stage loader is a requirement for games that need to run on actual hardware. There are variations to this theme (it does not have to be a two-stage loader per se), but the original retail releases all had them. The directory holds the file entries, which in essence are offset pointers to the “files” on the cartridge. These files follow the directory and consist of any number of up to 256 files. Typical for the Epyx two stage loader is that the first file holds (must hold) the title screen that is shown during the booting of the Lynx and game. The second one is the main executable. After that there can be any number of files holding code, data, music, graphics, whatever.

Here is a zoomed in picture of the directory and file entries pointing to the various files:

image

The file entries contain the information to find the start of the particular bytes on the cartridges, expressed in terms of cartridge page and offset in the page. Three bytes are used for that: one byte for the page number, as there are at most 256 pages in a cartridge, then two bytes for the offset, because the pagesize can be as much as 2048 (sometimes 4096) bytes.

The other things that a file entry has are the length of the file in number of bytes and the load address in memory. That’s another two bytes for the length (no use to have files longer than $10000 because that is the maximum RAM memory size, assuming you do not intend to stream from the cartridge) and two for the load address in the $FFFF address space. That’s 7 bytes, plus an 8th byte that is used as a reserved flag with value 0x00. It is not actually used in the Epyx style of file entries, although the BLL style file entries use it to mark it as an executable (0x88) or normal data file (0x00).

The structure of a file entry in bytes look like this (it’s the first file entry for APB in case you want to know):

image

You can see the offset, length and load address in LSB, MSB format. This particular file entry of APB is used to load the title screen from the cartridge at the first (zero) cartridge page with a 666 (0x29A) bytes offset. The title screen has a length of 3291 (0xCDB) bytes and is loaded at 0x8000 in RAM memory.

The next file entry in the directory is located right after the first and usually points to the file directly after the first file. Some calculations are in order. If the first entry points to page 0, 0x29A offset for a file 3291 bytes long, then it occupies bytes 666 to 666+3291-1 = 3956. The next file should be start at 3957 (0xF75). APB is a 256 KB cartridge, so each page is 1024 bytes. Taking the ROUND(3957 / 1024) = 3 means that it starts in the third page of the cartridge. The 3 pages take 3072 bytes, which means that location 3957 has an offset of 3957-3072 = 885 or 0x375 in hexadecimal. How did we do? Let’s check if we calculated that correctly. This is the next file entry of APB for the resident startup code:

image

Refer back to the previous picture and you should see that it indeed shows that the second file is located in page 3, 0x375. It is 0x1B4 bytes long and loaded at 0x0400 (which is typical as it is right behind the loader code from 0x200 to 0x3FF). You could repeat the process to calculate the next file location from there and keep continuing until you reach the end of the directory and the start location of the first file.

Note that it doesn’t always work out like this with files being contiguously located on the cartridge. You might decide that a file needs to start at a page boundary, meaning an offset of zero. This is much more efficient for loading, as you do not have to do dummy reads to advance the counter. The cartridge will end up with some unused space (call it fragmentation) caused by gaps from the end of a file to the start of the next page. It’s a trade-off between compactness/space and efficiency of loading files.

Loading files

All right, so far we know that each file entry is 8 bytes in size. You can find a particular file entry by skipping the 8-byte sets. For example, to read file number 3 (zero-based numbering) you would skip 3*8=24 bytes from the start of the directory, bypassing file number 0, 1 and 2, and reading the 8 byte tuple for the 4th entry. From the bytes the cartridge’s shift register for the page number is set, the dummy reads to advance the ripple counter are done and you start copying the bytes from every read to RCART0 to the destination RAM address until you have copied enough bytes for the length specified.

In CC65 all logic to do the preparations (shift register and ripple counter) have been taken and captured in the assembler code from lynx-cart.s. It holds the functions to skip bytes, read a single and a number of bytes, plus page (block) selection. To load individual files as specified inside a directory by a file entry you need the lynx_load function from load.s. This functions is available by using the include file lynx.h. I couldn’t imagine a scenario where you haven’t already included the file, so you should be good to go.

/*****************************************************************/
/*                     Accessing the cart                        */
/*****************************************************************/

void __fastcall__ lynx_load(int fileno);
/* Load a file into ram. The first entry is fileno=0. */

void __fastcall__ lynx_exec(int fileno);
/* Load a file into ram and execute it. */

The lynx_load functions is really simple to use: pass the file number and the file’s bytes will be read into RAM memory as specified in the file entry. Typically you will use define statements to give the symbolic, readable names to the file numbers. More on that later.

Laying out your memory

Before we can start creating a directory structure we need to know how CL65 as a linker will create an image and how our “files” are laid out in that big binary blob. You see, CL65 does not know nor care about files. It views your code and resources in terms of segments that are loaded into memory. You might want to refresh your knowledge of that by rereading the tutorial part on “Memory and Segments” before continuing.

Of the 64KB memory that the Lynx has a smaller number is available after the video buffers, C stack have been assigned.

image

If you have only a little code and data that will all fit in the remaining memory, you will probably not bother with files. But just imagine the challenges that arise when you have more code and data than fit into memory at one point in time. You will need to plan what will be in memory when. From that you will create a memory plan that shows the various memory areas that exist.

The special areas, such as zeropage, stacks and the videobuffers, are required for any Lynx game to function. This is regardless of the development kit you are using, except maybe for the C-stack. From here on these are not shown anymore, even though they might be located else in the memory space than in the initial pictures. Their location is fixed throughout the lifetime and state of the game and usually at the bottom and top of the 64K memory range, like shown in the picture.

Looking at an example

The example game we will use is a game that starts with an intro, consisting of a title screen and some music. Pressing A will start the game, where you will battle some nasty aliens, listening to different, exciting music. When you loose all your lives in the game, you will see a game over screen that plays some sad music. After this you will return to the intro screen and this loop starts over. The core of your program is always be in memory and in charge of loading and running the other parts of the program. There is also a resident part related to the C-runtime, because we are using C and CC65.

Perhaps a good way to design your memory areas is to start from the building blocks of your game. You pick your building blocks (intro, main game, outro, music, et cetera) and determine their required memory size. Next you need to place these blocks in the available memory space. If the sizes required by the blocks are small enough you can create a single memory area that accommodates all blocks at the same time. You would have the intro, outro, game and all three music pieces in memory simultaneously. Lucky you.

image

In this case, you create one big memory area that spans from $0200 to $BE38 and place all code and data in there. You could read it as one big blob from cartridge and execute it. If you have come this far in the tutorial series and have already created a game without having to use the lynx_load function, then you have done this already.

Here is the MEMORY definition from the lynxcart.cfg file that shows a single RAM section. The name for the area that holds the C-runtime and required startup code and data must be named RAM. We saw this before when we talked about memory and segments.

MEMORY {
  ZP:    file = "", define = yes, start = $0000, size = $0100;
  HEADER:file = %O, start = $0000, size = $0040;
  BOOT:	  file = %O, start = $0200, size = __STARTOFDIRECTORY__;
  DIR:   file = %O, start = $0000, size = 5*8;
  RAM:   file = %O, define = yes, 
start = $0200, size = __VIDEO__ - __CSTACK__ - $0200;
}

However, if you find that it does not fit, you will have to time-share the memory space in a smart way. That means creating areas where the “occupants” are not in their assigned “appartment” at the same time.

It is like a puzzle: what should be together in memory when? The resident part will always be in memory and never be unloaded or replaced. The functional parts however will occasionally be in memory and not necessarily at the same time. That’s a good thing, because when the total size of these part exceeds the available memory it wouldn’t be possible anyway.

The following pictures shows possible layouts of memory at three points in time for our game that has an intro, actual game and a game over outro screen, all with different music to play:

image

Let’s look at the three layouts in a bit more detail:

  • Intro
    The top layout shows the intro in memory (yellow), next to the intro music and the core part plus some additional resident stuff (all blue) that the C runtime needs.
  • Main game
    During gameplay the intro is replaced with the game code and resources. There will be different music, but the area in memory is the same. It’s just that other music will be loaded.
  • Game over
    Essentially the same as the intro, except with other code and resources, and different music.

For now the important parts are those that vary, and they are represented as the yellow and dark blue areas. The resident block is not free to use to your liking, but reserved. Notice how the amount of code and data loaded into the area before music differs. The intro requires less space, but the main part of the game uses the full amount of memory available in that area.

The core is responsible for loading the intro, game and outro. It should also load the correct music data. Because the music will always be playing regardless of the state the game is in, the code to play the music must always be present. That means that either the core or music area should hold the code to play the music. Placing it in the core is more efficient (placing it in each music area means it has to be loaded each time and must be present on the cartridge three times as well). The yellow areas are self-supporting parts of the game that have all code and data inside their respective memory blocks.

When designing your memory layout you need to be aware that there is a difference in the area of memory and the code and data that is loaded there at a particular point in time. Looking at our scenario you might be tempted to think that there is 1 memory area for the yellow blocks, from $0200 to around $7000, which we will name YELLOW for the sake of argument. But, in actuality there are three memory areas and YELLOW is not relevant. The memory areas are INTRO, MAINGAME and OUTRO and they just happen to be positioned on top of each other.

You could create a flexible layout that gives the music as much space as is available after the yellow area. That’s a valid option in which case INTRO would be $0200-$5000, MAINGAME from $0200-$7000 and OUTRO from $0200-$6000. Instead our approach chose to keep the memory sizes fixed from $0200-$7000 across the three game states, because the layout does not require the empty space left by INTRO and OUTRO. The same approach is taken for MUSIC. It is actually three areas (MUSIC1, MUSIC2 and MUSIC3) when the music loaded at each time is different. They can be the same size even though the exact size of music pieces 1, 2 and 3 are unlikely to be exactly the same.

image

We will compute the yellow blocks from the start of available memory $0200 and place the music right after it. The required RAM area needs to be located at the end of available memory right before the C-stack. Therefore, it is calculated from $BE38 backwards.

We will need some symbols to make things readble, flexible and maintainable. The bolded items are of interest for now. They indicate the sizes of our various blocks.

SYMBOLS {
  __STACKSIZE__: type = weak, value = $0800;
  __STARTOFDIRECTORY__: type = weak, value = $00CB;
  __BLOCKSIZE__: type = weak, value = 2048;
  __EXEHDR__:    type = import;
  __BOOTLDR__:   type = import;
  __RESIDENT__: type = weak, value = $1000;
  __GAMESIZE__: type = weak, value = $3000;
  __MUSICSIZE__: type = weak, value = $1000;
  __VIDEO__: type = weak, value = $BE38;
}

With these symbols we can calculate the memory requirements for each of our blocks. When you create a MEMORY area

MEMORY {
  ZP:      file = "", define = yes, start = $0000, size = $0100;
  HEADER:  file = %O, start = $0000, size = $0040;
  BOOT:	    file = %O, start = $0200, size = __STARTOFDIRECTORY__;
  DIR:     file = %O, start = $0000, size = 5*8;
  RAM:     file = %O, define = yes, size = __RESIDENT__,
start = __VIDEO__ - __STACKSIZE__ – __RESIDENT__; INTRO: file = %O, define = yes, start = $0200, size = __GAMESIZE__; MAIN: file = %O, define = yes, start = $0200, size = __GAMESIZE__; GAMEOVER:file = %O, define = yes, start = $0200, size = __GAMESIZE__; MUSIC1: file = %O, define = yes, start = $0200 + __GAMESIZE__,
size = __MUSICSIZE__; MUSIC2: file = %O, define = yes, start = $0200 + __GAMESIZE__,

size = __MUSICSIZE__;
MUSIC3: file = %O, define = yes, start = $0200 + __GAMESIZE__,
size = __MUSICSIZE__;

}

From segments to memory to files

After laying out the memory requirements and areas we essentially have a few buckets where we need to put our code and data in. At this point we return to where we left of talking about memory and segments. By assigning code and data to the various segments, we can fill the buckets with all types of segments CODE, DATA and RODATA. BSS segments are never placed on in binary files, because it is all uninitialized memory anyway and nothing more than a memory range.

The first part should look familiar:

SEGMENTS {
  EXEHDR:   load = HEADER, type = ro;
  BOOTLDR:  load = BOOT,   type = ro;
  DIRECTORY:load = DIR,    type = ro;
  STARTUP:  load = RAM,    type = ro,  define = yes;
  LOWCODE:  load = RAM,    type = ro,                optional = yes;
  INIT:     load = RAM,    type = ro,  define = yes, optional = yes;
  CODE:     load = RAM,    type = ro,  define = yes;
  RODATA:   load = RAM,    type = ro,  define = yes;
  DATA:     load = RAM,    type = rw,  define = yes;
  BSS:      load = RAM,    type = bss, define = yes;
  ZEROPAGE: load = ZP,     type = zp;
  EXTZP:    load = ZP,     type = zp,                optional = yes;
  APPZP:    load = ZP,     type = zp,                optional = yes;

It is the usual definition of the required segments that go into the standard memory areas. For the game we’ve been discussing some additional segments are required.

  # Intro
  INTRO_CODE: load = INTRO, type = ro, define = yes;
  INTRO_RODATA: load = INTRO, type = ro, define = yes;
  INTRO_DATA: load = INTRO, type = rw, define = yes;
  INTRO_BSS: load = INTRO, type = bss, optional = yes;
  # Outtro
  OUTRO_CODE: load = OUTRO, type = ro, define = yes;
  OUTRO_RODATA: load = OUTRO, type = ro, define = yes;
  OUTRO_DATA: load = OUTRO, type = rw, define = yes;
  OUTRO_BSS: load = OUTRO, type = bss, optional = yes;
  # Main game 
MAIN_CODE: load = MAIN, type = ro, define = yes;
MAIN_RODATA: load = MAIN, type = ro, define = yes;
MAIN_DATA: load = MAIN, type = rw, define = yes;
MAIN_BSS: load = MAIN, type = bss, optional = yes;
# Music MUSIC1_RODATA: load = MUSIC1, type = ro, define = yes; MUSIC2_RODATA: load = MUSIC2, type = ro, define = yes; MUSIC3_RODATA: load = MUSIC3, type = ro, define = yes; }

It is actually not that special. All code and data with the INTRO_ prefixed segments are assigned to go into the INTRO memory area. The same is true for the MAIN_ and OUTRO_ segments. You might notice how the MUSIC1, 2 and 3 segments only have a RODATA segment defined, as the music itself is just read-only data, not changing data, nor code.

You can safely assume that the linker will place the segments in a memory area in a certain order. How it is laid out is not really relevant. As long as you make sure you have loaded the code and data into the area before you start using it by calling the code or referencing the data.

What is more important is that the linker will create the binary file that holds all code and data in the order that the memory areas have been defined in the MEMORY section. The linker will emit the binary image with “files” for each of the areas that have a file=%O in their definition.

HEADER: file = %O, start = $0000, size = $0040;

This attribute will make the linker emit the contents of that memory area in the final file. You can see how it is referring to %O for each area, effectively appending the contents to the same, single output file. This file will be called whatever you have set as your target in the make file.

$target = tutorial-files.lnx
objects = lynx-160-102-16.o lynx-stdjoy.o \
          tutorial.o
$(target) : $(objects)
	$(CL) -t $(SYS) -o $@ $(objects) lynx.lib

The bolded item show how the $target is passed into the linker statement that will look like this when expanded:

cl65.exe –t LYNX –o tutorial-files.lnx lynx-160-102-16.o lynx-stdjoy.o tutorial.o lynx.lib

Long story short: the target is the file that is called %O in the memory area. In it all memory areas are written in the order in which they are declared. The order of the segments per area is not relevant.

Entries in a directory

The final thing that keeps us from having a fully functional file system is a directory. The purpose of the directory is to list the “files” and their respective positions within the binary image. In the end, the virtual file system of an Atari Lynx cartridge is nothing more than make believe. When we are able to compute the file entries like we saw at the beginning of this part, we are good to go.

The directory.asm file builds no code, just data representing the file entries. Here is the skeleton of the file, where details have been omitted for now.

.include "lynx.inc"
.import __STARTOFDIRECTORY__

; More imports
.segment	"DIRECTORY"
__DIRECTORY_START__:
; File entries go here
__DIRECTORY_END__:

The directory is created in a segment called DIRECTORY. It will hold the 8 byte entries that indicate where the files are located in the binary image on the cartridge.

You can see how start and end address symbols (__DIRECTORY_START__  and __DIRECTORY_END__) are declared these are used to compute the length of the directory itself. The first entry is our RAM area where the C-runtime and other resident code and data are located.

image

; Entry 0 - Resident executable (RAM)
off0=__STARTOFDIRECTORY__+(__DIRECTORY_END__-__DIRECTORY_START__)
blocka=off0/__BLOCKSIZE__
len0=__STARTUP_SIZE__+__INIT_SIZE__+__CODE_SIZE__+__DATA_SIZE__+__RODATA_SIZE__
	.byte	<blocka
	.word	off0 & (__BLOCKSIZE__ - 1)
	.byte	$88
	.word	__RAM_START__
	.word	len0

First, the offset of the RAM area off0 is computed, by taking the start location of the directory and adding the length of the directory to it. The page number is computed from the blocks already taken up. Next the length of the current area is calculated by adding the startup and initialization segment sizes, plus the code, data and read-only data segments. All of these were defined to go into the RAM area. Finally the file entry is added as raw bytes with the .byte and .word statements. Take a look at the picture above to see that it creates the right data. Normally you do not need to change this first entry at all. It is special in that it has multiple parts and requires the total size to be computed from them.

The next file entries are more of the same. The help in creating them easily a macro is included in the default directory.asm file.

.macro entry old_off, old_len, new_off, new_block, new_len, new_size, new_addr
new_off=old_off+old_len
new_block=new_off/__BLOCKSIZE__
new_len=new_size
	.byte	<new_block
	.word	(new_off & (__BLOCKSIZE__ - 1))
	.byte	$88
	.word	new_addr
	.word	new_len
.endmacro

You use the macro entry by feeding it the offset (which has the page number and offset) and length of the previous (old) area, plus the size and load address of the current (new) area.

image

This is what a call to the macro looks like

entry off0, len0, off1, block1, len1,__INTRO_CODE_SIZE__+
__INTRO_RODATA_SIZE__+__INTRO_DATA_SIZE__, __INTRO_CODE_LOAD__

The variables off0, len0 are the offset and length from the RAM area. off1, block1 and len1 are variables that are passed without meaning values. They will get a value after the macro has executed. Two of these values (new_off and new_len) will be used to feed into another macro call to create the next entry.

image

You will also add the new size and address. This repeats over and over again until all entries have been created.

entry off1, len1, off2, block2, len2, __OUTRO_CODE_SIZE__+
__OUTRO_RODATA_SIZE__+__OUTRO_DATA_SIZE__, __OUTRO_CODE_LOAD__

One thing we didn’t cover so far is where all these double underscore pre- and postfixed values come from. They are imported at the top of the file and come from values that the linker has emitted. Remember how the linker would create _SIZE__ and _LOAD__ postfixed values for each MEMORY and SEGMENTS declared area and segment that has the define=yes attribute set. From these values you can calculate the file entries, by simply adding the sizes to the old offsets. It’s like creating a long line of the memory areas, one after the other.

Typically you will use the _LOAD__ of the first segment in an area. The order of the segments is not really relevant, except that you need to take the first one listed per area. Like the areas themselves, the segments are laid out sequentially in a particular area. This implies that the first segment is located at the beginning. The various sizes you add together come from all the segments that are located in the area. There could be more than just one of each CODE, DATA and RODATA. It all depends on how you assigned segments to areas.

You need to adjust the lynxcart.cfg file to accommodate the space required in the directory. This is as simple as specifying the number of file entries in the multiplication (6 file entries in the example below):

  DIR:    file = %O,               start = $0000, size = 6*8;

After that you are good to go.

Putting it all together

Let’s do a quick recap of what is needed to create a files and corresponding directory.

  1. Create your memory areas based on the sizes they will need.
  2. Make a layout of the areas in memory and moments in time.
  3. Determine the start locations and finalize the definitions of the areas in the lynxcart.cfg
  4. Create segments that correspond to the areas and list them in lynxcart.cfg
  5. Write code and make resources and put these into the correct segments
  6. Create a directory entry that lists all memory areas, starting with the RAM area.
  7. In each part of your code, be mindful of the expectations for segments that need to be loaded. Call lynx_load before accessing functions, variables or resources in a segment.

To help out with the loading, do yourself a favor and define symbolic names to represent the file numbers. The file start at number 0, which is the RAM area and resident code. It is unlikely that you will ever have to load this yourself. After that the files

#define FILE_INTRO 1
#define FILE_OUTRO 2
#define FILE_MAINGAME 3

Here is a typical piece of code that shows how to use the lynx_load and file numbers.

lynx_load(FILE_INTRO);
show_intro();

In the code fragment, which could be inside your void main() static entry point routine, the show_intro function is located in the INTRO area. It needs to be loaded before it can be called. Hence the call to lynx_load, passing in the FILE_INTRO symbol. Having the names and number of files decoupled will be very useful when you need to reorder files in the binary image. You can change the file numbers in one place and will not have to hunt your code to check where you used the particular file number that has changed.

At this point it is worth mentioning that the first file (RAM) is only there if you use the mini-bootloader. That loader does not require a startup sprite. For games that use the Epyx bootloader, you would have seen a first file pointing to sprite data, and the second file to the resident RAM file.

Troubleshooting

You might be lucky and get it all to work the first time. If you did not manage to do so, or want some more internal look at what has been done and generated, you need some more information. That’s where the map file comes in handy. The map file shows a lot of things for your program/game, including the segment locations and what is located where.

To generate a map file an additional argument is needed in the call to the CL65.exe linker.

$(CL) -t $(SYS) -m cart.map -C lynxcart.cfg -o $@ $(objects) lynx.lib

By adding –m and passing a filename the linker will emit a map file (cart.map in this case) that holds valuable info. Here is an excerpt:

Segment list:
-------------
Name                   Start     End    Size  Align
----------------------------------------------------
DIRECTORY             000000  000027  000028  00001
EXEHDR                000000  00003F  000040  00001
ZEROPAGE              000000  000019  00001A  00001
EXTZP                 00001A  000032  000019  00001
BOOTLDR               000200  0002CA  0000CB  00001
INTRO_CODE            000200  000246  000047  00001
OUTRO_CODE            000200  00023D  00003E  00001
OUTRO_RODATA          00023E  00025C  00001F  00001
INTRO_RODATA          000247  000263  00001D  00001
INTRO_BSS             000264  000264  000001  00001
STARTUP               003200  00327C  00007D  00001
INIT                  00327D  0032AB  00002F  00001
CODE                  0032AC  0046E0  001435  00001
RODATA                0046E1  004818  000138  00001
DATA                  004819  00491A  000102  00001
BSS                   00491B  004A32  000118  00001

The segment list above shows the start and end addresses, plus the sizes of the segments that are created. It is not the complete list, but you can notice how the OUTRO_BSS is missing. Apparently nothing was created in the BSS segment for the OUTRO and there was no need to emit it.

Exports list:
-------------
_FileBlockByte         00002F RLZ    _FileBlockOffset       000027 RLZ
_FileCurrBlock         00002E RLZ    _FileDestAddr          00002A RLZ
_FileDestPtr           000031 RLZ    _FileEntry             000026 RLZ
_FileFileLen           00002C RLZ    _FileStartBlock        000026 RLZ
__BLOCKSIZE__          000800 REA    __BOOTLDR__            000001 REA
__BSS_RUN__            00491B RLA    __BSS_SIZE__           000118 REA
__CODE_SIZE__          001435 REA    __CONSTRUCTOR_COUNT__  000000 REA
__CONSTRUCTOR_TABLE__  0032AC RLA    __DATA_SIZE__          000102 REA
__DESTRUCTOR_COUNT__   000000 REA    __DESTRUCTOR_TABLE__   004815 RLA
__EXEHDR__             000001 REA    __INIT_SIZE__          00002F REA
__INTERRUPTOR_COUNT__  000002 REA    __INTERRUPTOR_TABLE__  004815 RLA
__INTRO_CODE_LOAD__    000200 RLA    __INTRO_CODE_SIZE__    000047 REA
__INTRO_DATA_SIZE__    000000 REA    __INTRO_RODATA_SIZE__  00001D REA
__OUTRO_CODE_LOAD__    000200 RLA    __OUTRO_CODE_SIZE__    00003E REA
__OUTRO_DATA_SIZE__    000000 REA    __OUTRO_RODATA_SIZE__  00001F REA
__RAM_SIZE__           008638 REA    __RAM_START__          003200 RLA
__RODATA_SIZE__        000138 REA    __STACKSIZE__          000800 REA

The bolded items will look familiar by now. Inspecting these values help find any overflow errors that the linker might report, or troubleshoot directory issues. A detailed look at how to track these errors is for another time.

Next time

We’ve looked at how to design your memory and create segments, files and the entries for your directory. With this you can start building beyond the 64KB limit that you otherwise have. Next time we will look at encryption of headers, or maybe input. Who knows. Till next time.

Posted in Tutorial | Leave a comment

Programming tutorial: Part 17–Interrupts

Atari Lynx programming tutorial series:

In part 13 we covered UART and serial communication. Then in part 14 we had a look at the timers inside of the Lynx. Both parts referred to interrupts as an important bit of functionality. Now it is time to dive deeper into interrupts and use their power to take your Lynx games to the next level.

Before we dig in deep into the Atari Lynx’s interrupts, you should have a good understanding of how interrupts function at the processor level. This will be a detailed overview, one that holds true for any 6502 processor, not just Mikey.

Backgrounder on 6502 family processors’ interrupts

There is an excellent write-up on 6502 interrupts by Gareth Wilson over at 6502.org. I suggest you read this when you want to know the nitty gritty details. I will provide a higher level overview of what is important. Gareth’s article fills in the gaps and the deeper details.

During normal operation the 6502 executes instructions by evaluating the opcodes at the current program counter (PC). It fetches the instruction located there and spends a couple of processor cycles performing the work. The PC is updated to point to the next instruction. Usually this is the next instruction in memory, but it can be somewhere else in case of branching (e.g. BEQ, BMI) or jumping (JMP, JSR). This mode of operation simply follows the flow of your code.

Getting interrupted

Normal operation can be interrupted by special events that occur. In most cases this is the hardware telling the processor that something important has happened, such as input that is available (keyboard, serial IO), or a timer that has expired and wants you to give you a chance to handle that. These special events are appropriately called interrupts and they trigger a specific sequence of action by the processor. The 6502 has two kinds of interrupts:

  1. IRQ: Interrupt ReQuests, normal interrupts
  2. NMI: Non-Maskable Interrupts, more important interrupts than the IRQ interrupts.

Both IRQ and NMI are essentially the same interrupts, except for an important distinction: normal interrupts can be “ignored” (also called masked), while NMI interrupts cannot be masked. This means that you can specify you do not want the IRQ interrupts to actually interrupt you, for example if you are in a critical piece of code execution, while NMI can never be suppressed.

The processor has an interrupt pin for IRQ and NMI that are (optionally) connected to the hardware that can signal an interrupt. The most critical hardware will use the NMI pin, while other hardware uses the IRQ pin.

Both the IRQ and NMI signal are high by default and will trigger when it goes low. These lines can be edged or level sensitive. Usually IRQ lines are level sensitive and will keep firing as long as it stays low. The NMI line on the other hand is edge sensitive in most cases and only triggers on a falling edge to avoid having it triggered over and over again. That would be bad since these cannot be ignored, so it will cause havoc.

The 6502 interrupt sequence

Whenever an interrupt occurs, be it an IRQ or NMI, the 6502 executes a sequence to stop the current execution of code, and render control for the handling of the interrupt to an interrupt service routine (ISR). The ISR location is determined from what is called a vector. Essentially a vector is a memory location where the processor can find the jump address for the reset, IRQ or NMI routine. There are three vectors in the 6502:

Vector Description Address
NMI Vector to NMI ISR $FFFA (low byte) and $FFFB (high byte)
Reset Vector to address of reset routine $FFFC (low byte) and $FFFD (high byte)
IRQ Vector to IRQ interrupt service routine $FFFE (low byte) and $FFFF (high byte)

The picture below (from interrupt tutorial at 6502.org) shows what happens at the processor level per clock cycle.

figure_1[1]

The current instruction that was executing is finished. As soon as that is done, the current program counter (address of next instruction) is pushed onto the stack. First the high byte, then the low byte. Next, the status register (processor status or PS) is also pushed onto the stack. Finally the IRQ or NMI vector is fetched from their respective addresses and the processor will continue execution at the vector addresses.

You can think of the interrupt and the vectors as JSR to the addresses specified at $FFFE and $FFFA. Something like JSR ($FFFE) for IRQ and JSR ($FFFA) for NMI. It is not exactly the same, because the PS is also placed onto the stack and the exact PC value is somewhat different, plus you return from a JSR with an RTS, but with a RTI (ReTurn from Interrupt) for a IRQ or NMI. Other than that the two are comparable to a certain extent.

An “Hello World” Interrupt Service Routine

A really simple interrupt service routine might look like this:

F000  INC $1337
F003  RTI

It could have been as simple as RTI, but that would have been essentially a stubbed out handler to would return as soon as it is called without actually doing anything. Useful only when you want to have an empty ISR. Instead the example above shows a counter at address $1337 is increased every time that the ISR is executed.

You need to put the address of the ISR ($F000 in this case) in the IRQ interrupt vector during startup, or an interrupt will jump off into unknown byteland. The wiring-up boils down to a bit of code like this:

SEI
LDA #$00
STA $FFFE ; or STZ $FFFE for short
LDA #$F0
STA $FFFF
STZ $1337 ; Initialize the data register
CLI

which puts the bytes $00 and $F0 at the low and high byte of the IRQ vector at $FFFE and $FFFF respectively. We will talk about the SEI and CLI instruction in a moment.

Writing an interruptor in CC65

The CC65 compiler allows you to create an interrupt service routine through assembler code. There is some special syntax required to wire your code to be called when an interrupt fires. Here is a sample that implements the simple handler

.interruptor _handler
 
.proc   _handler: near
 
.segment "CODE"
	inc $1337
 
done:
	clc
	rts
	
.endproc

But wait, what’s this? There is no RTI at the end and a mysterious CLC instruction. The reason is that interruptor handlers are wired together by the CA65 assembler. Each interruptor address is stored in an table. Whenever an IRQ occurs every handler in the table is called in order of priority. Each handler can indicate whether the other handlers still need to be called. It is the carry flag that conveys this intention. When the carry flag is cleared, the calling of other handlers should continue. If set, the handler tells the runtime that it has hcompletely handled and cleared the interrupts, and calling the others is not needed anymore.

A priority is specified as follows:

.interruptor _vbl, 15

The number indicates the priority. A higher value gives the handler a higher priority. The default value is 7. You can read some more at the CC65 documentation wiki on the .interruptor control command.

There is always a VBL handler created if you use the TGI library. TGI uses a handler to perform a swap of the video buffers at the right time, so no screen tearing occurs. Screen tearing would happen when the swap is performed midway during the drawing of the current buffer. The VBL interrupt is an excellent moment to do it, hence the choice for a VBL handler.

Here are a couple of strategies for building your handlers:

  • Strategy 1: Create a big handler that checks for each and every interrupt source. This would keep the handler table small and give a single point to have your own interrupt handling logic.
  • Strategy 2: A handler per interrupt type. It will give small and concise handlers that are easy to maintain. It implies a little more overhead of multiple jumps, but it is disputable if that is noticeable or significant.
  • Strategy 3: Override the TGI handler by your own. Given that you would need to recompile the TGI library to alter its VBL handler, you could specify your handler with a higher priority and return with a SEC call before the RTS.

Don’t interrupt me

There are occassions when you do not want interrupts to occurs. These are some typical moments when it is inconvenient to be disturbed:

  1. An ISR is already executing
    Once an ISR is executing it can be impractical to have a new IRQ come in and trigger a new ISR from the current ISR. That would make it a bit like the movie Inception, where dreams occur within dreams within dreams within dreams… You get the picture.
  2. You are manipulating the vector address values
    When you have changed either the low or high byte but not the other, there is a very brief moment where the vector address is invalid. Should an interrupt request come in at that particular time, it will probably lead to unwanted and unexpected behavior.
  3. Bootstrapping or initialization code is running
    At this time things may not have been properly setup for the program and data registers to start executing ISR code.

The 6502 processor has a bit flag in the processor status called I (for Interrupt Disabled) that determines whether an IRQ is acknowledged or not. NMI interrupt requests are unaffected by the bit, because they are unmaskable and cannot be suppressed or ignored.

When the I bit is set, no IRQ requests are responded to. You can influence the bit with two instructions:

SEI    ; Set Interrupt Disable flag: masks IRQ interrupts
CLI    ; Clear Interrupt Disable flag: listen to IRQs again.

By default new IRQ interrupts are ignored during an ISR. So, you do not have to call SEI at the start of your ISR code, be it IRQ or NMI triggered.

Remember that NMI interrupts are always acknowledged, whether the I flag is set or not. However, when an NMI or IRQ interrupt service routine is executing, you can choose to call CLI and let new IRQ requests come through, should they occur.

You might want to clear the source of the interrupt before calling CLI to accept new IRQs or RTI to return from an interrupt. Since the IRQ line is level-sensitive it is important to note that when the level is still low, a new IRQ will immediately fire. Luckily, the Lynx has edge-sensitive IRQs, so you don’t have to take that into account, except for UART interrupts as these are level sensitive. We will talk about this later.

Interrupt sources in the Lynx

The Lynx has 8 distinct hardware sources that trigger input. The hardware is always a timer. And, since the Lynx has 8 timers, an interrupt can come from each (and all) of those sources.

A quick recap of the timers that the Mikey holds:

Timer # Description Relation to interrupt
0 Horizontal blank (HBLANK or HBL). Fires when the end of a “scanline” has been reached.
1 General purpose timer 1  
2 Vertical blank (VBLANK or VBL) Fires interrupt after all lines on a screen have been drawn. Useful for doing work that is screen critical (such as the moment of swapping screen buffers).
3 General purpose timer 3  
4 UART RX or TX related Doesn’t fire at timer expiration, but rather at the moment when data has arrived in receive buffer or when transmit buffer is empty.
5 General purpose timer 5  
6 General purpose timer 6  
7 General purpose timer 7  

Each of these timers have an Enable Interrupt bit in their static control register A. Only when this bit is 1 (enabled) will the interrup fire at the moment of timer expiration. In code this would look something like this:

#define ENABLE_INTERRUPT 0x80
MIKEY.timer1.control = ENABLE_INTERRUPT | 0x1E;
MIKEY.timer1.reload = 255;
MIKEY.timer1.count = 255;

The one exception here is the UART timer #4. This timer’s interrupt does not fire at the timer expiration. The timer’s purpose is to generate the baud rate for UART and it will expire at a steady pace to transfer the single bits of data, plus some extra such as the start, stop and parity bit. Lots of expirations that do not really matter. The relevant moment to fire an interrupt for UART is when data has arrived in the receive buffer, or when there is no more data to be sent (if the transmit buffer runs empty). For that, you need to set the TX and RX Interrupt Enable flags (TXINTEN and RXINTEN) to 1 for enabled.

MIKEY.serctl = TXINTEN | RXINTEN | PAREN | RESETERR | TXOPEN | PAREVEN;

This enables both receive and transmit interrupts, besides the normal settings for enabling even parity while resetting any errors and switching the UART to open collector.

In summary, the timers will generate IRQs when they are configured to do so. The timers will always run, no matter what type of code is executing. Once expired they will generate an interrupt, but this will only cause the call of the ISR through the IRQ vector when the I flag of the processor status register is not set.

Inspecting the sources

When an IRQ occurs it is often necessary to determine the source of the interrupt. It could be any one of the 8 timer sources or a combination of them. Each of the timers has a interrupt flag associated with it. Each and every interrupt flag that is set will cause the IRQ signal to be low and raises an interrupt.

The 8 bits of the interrupts flag would fit nicely into a byte, right? Mikey has two special interrupt related hardware registers for that very byte. These are INTRST ($FD80) and INTSET ($FD81).

image

Their purpose is to allow you to expect and manipulate the sources of the interrupts by looking at the bytes of the value located in each of them. They both hold the same set of bits when you read from either address. The value for INTSET or INTRST has the bits from the interrupt flags in this order: timer 0 at bit 0 up to timer 7 at bit 7. Writing to INTSET and INTRST is a totally different thing.

image

The INTRST will set interrupt flags to zero when written to. It will set the flags for the bits that are present in the (mask) value you write. It leaves the other bits unaffected.

image

INTSET will push the values written into it to the interrupt flags. It provides an easy way to reset them all by writing a zero to it (just like writing $FF to INTRST would). On the other hand writing a non-zero value will cause an interrupt flag (or flags) to be set, effectively causing an IRQ indirectly.

The best practice is to read from the INTSET at the beginning of your ISR code. It will get you the bits for the expired timers and serial interrupt. After you have nearly finished your ISR you can write the value from INTSET to INTRST causing those interrupts to be reset. If a new interrupt occurred during the execution of your ISR, the respective bit or bits are unaffected. When the ISR returns to normal code, there is still a bit set in the interrupt flags and a new IRQ will occur. That is probably intented, because you missed a new interrupt and want to handle that as well.

Reading INTSET and writing that to INTRST is usually a good approach.

However, the UART triggered interrupts are level-sensitive, so they will keep triggering unless you clear the source explicitly. Here is an abstract from the Epyx development kit’s documentation on UART and ComLynx:

7. Unusual interrupt condition.
Well, we did screw something up after all. Both the transmit and receive interrupts are ‘level’ sensitive, rather than ‘edge’ sensitive. This means that an interrupt will be continuously generated as long as it is enabled and its UART buffer is ready. As a result, the software must disable the interrupt prior to clearing it.
Sorry
.”

Another example: HBL and VBL interrupts

A more complete example is one where we do some effects based on horizontal and vertical blank (HBL and VBL) interrupts. The goal is to change the color of the black pixel each scan line, which creates the banded effect on screen. It is as simple as increasing the red value at $FDB0 (BLUERED0). The difficult part is that this has to be done for every scanline.

By now we know that the HBL occurs when timer 0 expires. It has its interrupt enabled by the boot rom initialization. That part is covered. This is the interruptor we need to include in our code:

.interruptor _hbl
.include "lynx.inc"
.export _hblcount
 
_hblcount:
	.byte   $00
 
.proc   _hbl: near
 
.segment "CODE"
	lda INTSET
	and #TIMER0_INTERRUPT
	beq done  inc RBCOLMAP+0
	inc _hblcount
 
done:
	clc
	rts
	
.endproc

The bolded statements are of most interest. Taking it from the top, an interruptor is declared to point to the handler routine called _hbl. There is also an exported variable called _hblcount that serves as a counter for the total number of HBL interrupts. The CODE segment loads the interrupt flags from INTSET and checks whether the flag for timer 0 (the HBL timer). If so, this handler is called for an HBL IRQ and it can continue by increasing the red value (note that it will also increase the blue value every 16th HBL) and the HBL counter. If not, the two increase operations are skipped. Finally, we clear the carry flag to indicate that other handlers should still execute.

image

The other handler for vertical blanks (VBL) interrupts is fairly similar. It does a check for timer 2 instead of 2 and will

  • Increase a frame counter variable
  • Reset the Red/Blue value to zero, so we always start the new frame with the same value
  • Store the last HBL count, so we can see how many HBL interrupts fire per frame.
.interruptor _vbl
.include "lynx.inc"
.export _framecount
.export _lasthblcount
.import _hblcount
 
_framecount:
	.byte  $00
_lasthblcount:
	.byte	$00
 
.proc   _vbl: near
 
.segment "CODE"
	lda INTSET
	and #TIMER2_INTERRUPT  ; Check for VBL timer
	beq done
 
	inc _framecount   ; 
	stz RBCOLMAP+0    ; Reset Red/Blue value to create steady image
	lda _hblcount
	sta _lasthblcount
	stz _hblcount
done:
	clc
	rts
	
.endproc

When you look at the screenshot above you can see a couple of remarkable things. First, there are 105 horizontal blanks. This is in accordance with the documented 3 scanlines of blank time every frame. Plus, you can see that the HBL for the 3 invisible lines are right after a VBL interrupt. The first band is 3 pixels smaller than the others, which can only happen if the HBL counter was already running 3 lines before the first visible line is drawn. This observation was already shared by TailChao at the AtariAge forum.

Next time

This time we dove into interrupts for the 6502 processors and looked at some Lynx console specific details.

In the meantime, some additional reading on interrupts in the Lynx is available in the Epyx documentation:

Posted in Tutorial | Leave a comment

Epyx Development Kit: part 2–Pinky and Mandy

Working with Pinky and Mandy

Let’s skip a lot of things you need to do to create your first Lynx binary, be it a game or another type of program and pick up at the point where you want to run your program on real hardware. In the early days of Lynx development there was no emulator, so you could only see and test your code running on an actual device. Nowadays it is trivial to create an encrypted ROM, put it on a FlashCard and run it on a normal Lynx. In 1989 Lynx developers only had either Pinky/Mandy or Howard/Howdy. The focus is on Pinky and Mandy although most holds true for Howard and Howdy.

Pinky can function in one of two ways. It can be a FlashCard of sorts, where you can upload your code in the device and have it act as a cartridge to Mandy. Additionally, it can be a passthrough communication device that facilitates in a live (remote) debugging session between the Amiga computer and the Mandy console. A set of jumper switches allowed you to change the mode of Pinky and configure its memory size and use of the EPROM with the Pinky bootloader.

WP_20140426_015

For a regular debug session with Pinky and Mandy you would connect Pinky to the Amiga with parallel port and Mandy using the propriatery flatbed cable. Next, you would start the Amiga and run the ManDebug debugger program. Here’s a small bit of what is happening under the covers when you boot a Amiga machine that has been modified by the Epyx SDK.

image image

Essentially it assigns drive letters (symbolic names) to the two important SDK folders 6502 and HANDY. It also adds the HANDY drive to the search path, so you can run the SDK tooling from everywhere. Finally, the ManDebug program is launched separately from the Shell, keeping it free to do other things.

When ManDebug launches it presents a console application that looks somewhat like this screenshot in the WinUAE emulator:

image

image image image

Even though it is named HanDebug at the top, it actually is ManDebug. You can see so when you look at the greyed out tabs for Trace and ROM, plus the greyed out button that reads Bus Monitor. That is the missing functionality in Pinky when compared to Howard. (Thanks James Jacobs for the hint to change the Preferences to 80 column Text Mode)

The pictures below show what ManDebug looks like running on my Commodore Amiga 2000 computer, plus what Mandy shows on the screen after booting (a line indicative of the loader program placed in middle of video memory).

WP_002599 WP_002512 (1)

Booting Mandy

The Mandy console uses a normal Atari Lynx power supply with +9V DC. The power adapter powers both the Mandy and Pinky device and it needs the full 9 Volts to do so. Mandy switches on like a normal Lynx and immediately turns on Pinky as well. The boot process depends on the jumper settings. Assume that it set for the debugging scenario, meaning that it will use the Pinky EPROM as the first content on the cartridge.

WP_20140426_017

It is encrypted in the EPROM and follows the normal decryption process once loaded by Mandy after booting and loading the “cartridge”. You can find the decrypted contents at $0200 like you would for regular (commercial) cartridges.

The Pinky EPROM will load a second stage loader that expects block of bytes to be uploaded from the Amiga via Pinky to Mandy.

0200     A2 66         LDX #66
0202     BD 1B 02      LDA 021B,X     ; Copy second stage loader to $3000
0205     9D FF 2F      STA 2FFF,X
0208     CA            DEX
0209     D0 F7         BNE 0202

020B     A9 08         LDA #08
020D     8D F9 FF      STA MAPCTL     ; set vectors for RAM
0210     A9 00         LDA #00
0212     8D FA FF      STA CPU_NMI+LO ; set NMI to point to code
0215     A9 30         LDA #30
0217     8D FB FF      STA CPU_NMI+HI
021A     80 FE         BRA 021A       ; sit here waiting for NMI

This part will copy the second stage loader from $021B (directly after the first part) to $3000. This address also serves as the NMI vector, i.e. the address that will be called when an NMI occurs. Finally, the first stage will stay in an endless loop and wait until a NMI interrupt occurs. Harry Dodgson explained this to be a “hardware lockout”. Indeed, the contents of the Pinky loader is general purpose and circumvents the need to have an encrypted header for the ROM itself and bypasses any checksumming on the contents of what is loaded. Essentially, it is a perfect troyan horse to get your homebrew code into a Lynx. The dependency on a NMI hardware signal makes it impossible to use this without a Lynx console that has been modified to allow an NMI falling edge to occur on the respective pin on Mikey. So, it is a lockout of the ROM using hardware.

Uploading the monitor program

The next step involves pressing the NMI button on the front of the Pinky device. It will trigger an NMI, and Mikey will jump to the NMI vector at $3000. That piece of code is going to wait for the following bytes to arrive through the parallel port:

Load address (2 bytes): LO, HI
Length (2 bytes in 2’s complement): LO, HI
Actual bytes of file

The first four bytes indicate the load address and length. The loader reads the data and copies it to Mandy RAM memory at the load address and finally execute the loaded code.

For debugging, it is necessary to first upload a file called monitor.bin. You click the Bootstrap button and a dialog opens that lets you specify the file.

image

The monitor.bin file has code that will communicate with the Amiga by using Pinky as a “dynamic” cartridge. It will use different cartridge pages to indicate which parallel port line it want to read from or write to. It will use the data lines of the port for data transfer and the input and output control lines for negotiating the conversation.

Mandy will run the monitor program after having loaded it at $F900-$FF00. The monitor will intialize itself by setting the IRQ and NMI vectors to point to its own two handler routines. It performs a handshake ritual to initiate the communication with Pinky whereby the connection with the Amiga is established. The title of ManDebug will change from “Parallel Port is DOWN” to “Parallel Port is ACTIVE”. Then the monitor sits idle waiting for the first incoming command from ManDebug.

Communication between ManDebug and Mandy

All electronics aside the communication between the Amiga and Mandy consists of sending commands from the Amiga to Mandy and receiving answers in the opposite direction.

It is always ManDebug that initiates a command, but the monitor needs to be in control. It can come into control in one of two ways:

  1. Pressing the NMI button on Pinky
    This will create a NMI signal that causes the NMI handler in the monitor to execute, because the NMI vector is set to point there. At that stage the monitor takes over control and performs its handshake ritual.
  2. A BRK instruction in code is encountered
    Whenever a BRK is executed by the 65SC02 processor, it will perform its normal IRQ routine and jump into the IRQ handler. But, for a BRK instruction the B flag is set in the processor status register before it is pushed onto the stack. It allows the handler to inspect whether a normal IRQ from the timers came in or a software IRQ from a BRK instruction.
    The IRQ handler is also inside the monitor and will check the presence of the B flag (for the BRK). If present, the NMI handler will be called. Otherwise, the normal IRQ jump table will be used to jump into the respective IRQ handlers that were (potentially) registered for the 8 timer IRQs of Mikey.

Debugging with ManDebug

The topic of debugging deserves a chapter of its own, as a lot can be done. ManDebug offers the following functionality during a debug session:

  • Inspect and change internals of Mandy
    This includes the registers A, X, Y, the processor status, stack pointer and current program counter address and the current RAM memory (entire range from $0000 to $FFFF).
  • Watch memory variables
    The variables you want to inspect are single byte or double byte values located in memory. The current value is shown for the variable and update whenever it changes, provided the monitor is in control again.
  • Set breakpoints in the code
    A breakpoint will cause the monitor to be in control again, so you can inspect the state of Mandy, alter it if desired and resume execution.
  • Step through the code instruction by instruction
    It is possible to step into JSR routines or skip over them.
  • View memory structures
    When there is a area of memory that has a specific layout (such as a Sprite Control Block (SCB), you can declare the structure of the memory and view the memory in a window that is specifically designed to show the structure.
  • Resume execution
    This will restore the pre-interrupt state and resume execution of the program. It means that the monitor releases control and gives control back to the program again (at the risk of not regaining it by a breakpoint).
  • Fill a memory range
    A single constant byte will be used to fill a part of memory. It can be used to wipe a piece of memory using all $FF for example.
  • Watch the Bus for special circumstances (Howard only)

image

  • Upload a memory range into ManDebug
    The specified range of current RAM memory values in Mandy is sent from Mandy to the Amiga and is stored in a file with a name and format you have chosen.
  • Download a file into Mandy
    This will send the contents of a file from the Amiga to Mandy.

image image

Sending commands

The ManDebug debugger can send its commands provided the monitor is listening for incoming commands. The handling is performed inside command loops. The main loop looks a little bit like this:

  1. Wait for command byte
  2. Check byte
    0x00: Done
    0x01: Download/Receive (from ManDebug to Mandy)
    0x02: Upload/Send (from Mandy to ManDebug)
    0x03: Continue
    0x04: Slave request (?)
    0x05: Go
  3. Repeat from top

The Download and Upload commands are other loops, with subcommands. We’ll discuss those shortly. “Continue” will restore the pre-NMI or IRQ values for the stack, A, X and Y plus the processor status register and jump back to the previous address before the interrupt. “Go” resets the IRQ jump table and then do a “Continue”. I have not figured out what the “Slave request” is supposed to do, but I assume it puts Mandy in control to initiate requests to ManDebug.

A lot of the available main loop commands are revealed in an include file (from the Epyx SDK) called monitor.i. It shows the following constant declarations:

NOP_REQUEST         .EQU    0
DOWNLOAD_REQUEST    .EQU    1
UPLOAD_REQUEST      .EQU    2
CONTINUE_REQUEST    .EQU    3
SLAVE_REQUEST       .EQU    4
GO_REQUEST          .EQU    5
SEE_HOWARD_REQUEST  .EQU    6
HIDE_HOWARD_REQUEST .EQU    7

The last two commands are not available in ManDebug (unfortunately), because of the missing hardware.

The “Download” and “Upload” loop have the following commands used by ManDebug:

END_OF_FILE    .EQU    $00  * Done and return to main loop
ORIGIN         .EQU    $01   * Set load address
DATA           .EQU    $02  * Transfer data (max 256 bytes)
RUN_ADDRESS    .EQU    $03  * Set run address
REGISTER       .EQU    $10  * Send/receive registers (A, X, Y, SP, PS and PC)
FILL_MEM       .EQU    $11  * Fill memory range with specified (single) value
LARGE_DATA     .EQU    $12  * Transfer data (max 65536 bytes)

Only commands 0, 1, 2, and 10 are used for Upload. Download uses all of them.

Using a combination of sending such commands, the functionality of ManDebug is implemented. For example, to download a file into Mandy, the debugger will send a DOWNLOAD_REQUEST first, then an ORIGIN command byte to indicate the load address, followed by LARGE_DATA (accompanied by the length and the actual data). It returns to the main loop again by END_OF_FILE.

With a logic analyzer you can actually see the bits going across the parallel port.

image

The picture above shows the data lines for a simple command sent from the Amiga to Mandy (through Pinky).

Breakpoints and stepping

Analysis of the various functionality showed that breakpoints and the stepping are actually made possible by replacing the instruction at a specific address with a BRK command. A pretty nifty trick that makes use of the IRQ handler to call the NMI.

For breakpoints the original instruction is remembered and replaced by BRK. When it the original instruction is restored. Stepping involves replacing the next instruction that will execute with a BRK. This can be determined easily if you know how the instructions behave. This is deterministic, even for branch instructions if you know the status flags. Fortunately, you can transfer the current state of the processor status as part of the whole set of 65SC02 registers mentioned earlier. When control returns after a step, the original instruction is restored and the next instruction is replaced.

It can happen that when a step is performed the next instruction is never reached. In that case ManDebug will report that control did not return. Your only option to regain control is by hitting a user-specified breakpoint or pressing the NMI button on Pinky.

Posted in Hardware | 2 Comments

Epyx Development Kit: Part 1–Contents

Contents of the Epyx development kit

The Epyx development kit for Handy consists of a number of items ranging from hardware to reference materials and software:

Mandy and Pinky

Mandy is a slightly modified, fully functional Lynx I that has the Reset and NMI pins connected and has a special “cartridge”. The cartridge is essentially a connector to Pinky. Pinky is a custom electronics board that sits between the Commodore Amiga and Mandy to facilitate the communication and optionally hold ROM images of your Lynx programs. It has two buttons to trigger the reset and NMI signal to Mandy.

WP_20140426_023_thumb[2] WP_20140426_009_thumb[2]
WP_20140426_002_thumb[2] WP_20140426_013_thumb

These pictures are from my own development kit. In clockwise order starting at the top left they show

  • Pinky and Mandy with a blue parallel cable and the special cable to Mandy
  • Special cartridge in Mandy holding the cable
  • Inside of Pinky
  • Outside of Pinky with the blue parallel cable and two buttons (Reset and NMI).

You can see some additional pictures here at the Handheld Museum.

Howard and Howdy

This hardware set was more expensive than Pinky and Mandy, but offered some extra functionality. The set has two pieces of hardware called Howard and Howdy. Where Mandy is a full Lynx, Howdy has no processor and memory of its own. Instead, the guts of the Lynx exists in Howard, a PC case with a huge motherboard.

post-5140-0-40247600-1354856273 post-5140-0-62583600-1354856263 post-532-129051274036 post-532-129051271498

It holds the 65SC02 processor and lots of RAM and additional logic to offer functionality for bus monitoring and tracing of your code. Howard is connected to Howdy for display, sound and input. In turn, the Amiga is connected to Howard. The last two pictures (from this thread at the AtariAge forums) show the Howdy console connected to Howard.

The Epyx kit did not include the Commodore Amiga 2000 that was needed. You could have any Amiga machine, but the 2000 model was recommended because of its harddisk and memory.

Reference manual

A binder full with hundreds of pages detailing the internals of the Handy hardware, the use of the Amiga software and Epyx SDK for developing Lynx programs. When updates to the SDK were made, addendums where issued that you could place in the binder.

post-27403-0-52763900-1393252317 post-27403-0-96812000-1393252337

Software

The software accompanying the the development kit is provided on a set of 8 disks that contain the Amiga software, source code and samples you need to develop Lynx programs.

The SDK’s 3.5″ floppy disks restore a Quarterback backup set to the system partition. You needed to have Quarterback software to use the disk. This is the way it works for the 1.6 revision of the SDK. Older sets may have worked in a different way. The backup sets were created using QB 4.2, but version 5.0 is also capable of restoring the set. The restore would add custom files for the Workbench 1.3 operating system under C2, replace some its system files in the C folder and place the SDK tools and source code (actual sources, include files, macros and sample code) in two folders called 6502 and HANDY.

post-27403-0-27857700-1393252346

The development software contained the compiler, sound and rom creation tools, and the source code for building Lynx programs. Additionally it had Amiga tools that made working with the Amiga as a development machine a little easier (e.g. faster fonts and a better text editor).

Posted in Hardware | Leave a comment

Lexis easter egg

The game Lexis published by Songbird has a really neat easter egg. You can play a game of Galaxian whenever you feel like it. Here’s how to access the easter egg:

Galaxian in Lexis6

Go to the Table of Contents screen and press Left, Right, Left, Right, Up, Down, Option 1 and finally Option 2. After you have done that, start a regular game of Pages. It may seem that the game simply starts, but you get an easy finish, by completing the word “SCIENTOLOGY” with the missing T. Receive compliments and enter your name in the highscore table.

Galaxian in Lexis3 Galaxian in Lexis4 Galaxian in Lexis5

You should now have a game screen for a good game of Galaxian. A fine example of an easter egg that offers more gameplay.

Galaxian in Lexis Galaxian in Lexis2

Enjoy!

Posted in Games | Leave a comment

Programming tutorial: Part 16–Cartridges

Atari Lynx programming tutorial series:

In part 15 we discussed the memory and segments and how those are related. Before we can go into the details of loading segments into memory, we need some background on the cartridges that the Lynx uses for storage of binary information. This part we will look at the internals of cartridges and how to do raw reads from it.

Of ROM and RAM

Before we get started with the internals, it is worth pointing out a few pecularities of the Lynx. In previous parts we touched on this, but now a refresher and some details are badly needed.

You see, the Lynx only has RAM. 64 KB of it. Read part 12 and 15 to find out how this is organized. Other systems have less RAM (sometimes) and use part of their address space to look “into” ROM cartridges. These systems have the luxury of memory mapped swappable ROM. For example, the Atari VCS 2600 only has 128 bytes (!) of RAM, while there is around 4KB of address space to read from ROM “memory” of the inserted cartridge.

99 fgames were developed for the device. They were supplied as thin card-style cartridges with a prominent edge to make them easier to remove
Photo: Alex Kidman

No such luck for the Lynx. It only has RAM and will need to read its code and other binaries into RAM from the peripherial device called the cartridge. The cartridge can be viewed as a read-only harddisk of some sort. Like a PC the Lynx will have to read data from the cartridge and store it in memory.

A side note: at one point in time Atari had the idea to read games from tape. There is still reference of the tape and some hardware addresses like MAGRDY0 ($FD84) that are directly related. The timers 1, 3, 5 and 7 were also meant to be used for signalling the baud rate of the tape device.

It might seem that this is sort of limiting and that we took the short straw with the Lynx. Nothing is further from the truth. The setup allows us to use a lot of RAM in any way we like. We are not tied to certain memory ranges that we must use. Additionally, we can have cartridges that are much larger than the available RAM. The Atari Lynx cartridges come in different sizes. The common ones are 128, 256 or 512 KB, although smaller and larger variations can and do exist. We get to choose how and when to load data from the cartridge and where to store it. Heck, you can even stream live from the cartridge as some libraries have already demonstrated. HandyMusic can play sound effects in PCM format straight from the cartridge. How nifty is that?

Physical structure of the cartridge

Even though the sizes vary, all cartridges have something in common: they have the same (maximum) number of 256 blocks. For each cartridge every block contains a fixed number of bytes. Two simple formulas give the total cartridge size from the block size of a cartridge and vice versa:

TOTALSIZE = 256 * BLOCKSIZE             – or –
BLOCKSIZE = TOTALSIZE / 256

This tabel helps find the right sizes:

Cartridge size (KB) # Blocks Blocksize (bytes) Pins
64 256 256  A0-A7
128 256 512 A9-A8
256 256 1024 A0-A9
512 256 2048 A0-A10
1024 256 4096 A0-A10+?

The italic red ones indicate uncommon cartridges. No commercial cartridges with 64 KB and 1 MB have been released during the Atari age.

To give you a visual impression of the cartridges and their sizes, you can take a look at the picture below. It depicts the blocks and their sizes.

image

No matter how you look at the cartridges, their behavior is always that of a stream of bytes starting somewhere within the cartridge’s binary image and continuing for as long as you are reading bytes.

Close connections of console and cart

The Lynx and the inserted cartridge are connected to each other through a large flat connector that sits inside the Lynx console.

image image image
Pictures from http://www.flickr.com/photos/consolingmyself/sets/72157628031292511

The connector passes the pins to a couple of signals and pieces of electronics on the Lynx motherboard:

  1. VCC +5V
  2. 74HC4040 (12-stage binary ripple counter; generates cartridge addresses A0-A10)
  3. 74HC164 (8-bit serial-in, parallel-out shift register; generates cartridge addresses A12-A19)
  4. Data lines (8-bit lines that go on the data bus)
  5. Ground
  6. Auxiliary Data Input/Output (aka AUDIN, not to be confused with Audio IN)

As a programmer you must know about the ripple counter and the shift register. These two pieces of hardware together build the cartridge address you are reading the data from.

image

It works like this: the shift register builds the high part of the cartridge’s address. It can target 256 different values, that correspond to the 256 blocks (or pages) of the cartridge. The lower part of the address is created from the ripple counter. That counter will start counting at value 0 and auto-increment after every read from the cartridge.

Different sized cartridges have different wirings from the A0 to A20 lines. More precisely, smaller cartridges have not all pins from A0 to A10 connected. They will only wire from A0 up to whatever they need. A 64KB cartridge needs to be able to address 65536 bytes which requires 16 bits. It is sufficient to connect the wires A0 to A7.

image

Look back at the table above and find out what the pins for each cartridge size are.

The data lines that are present will hold the byte value from the cartridge. Reading it will pulse the CE0/ line on the cartridge, advancing bank 0 to the next byte in the ripple counter. And so on.

The AUDIN pin is used heavily on custom cartridges as follows:

  1. An extra address line for 1 MB cartridges, giving a virtual A20 line for large enough EEPROMS.
  2. A bank-switching bit that allows switching between more than one bank (two usually) to increase the maximum number of data in the cartridge to 1 MB as well.
  3. Enable bit for EEPROM carrying cartridges, such as Lynxman’s Flashcard. By setting this bit high and using special address lines, you can write to the EEPROM, effectively saving (limited amounts of) data to the cartridge. The EEPROM is a separate chip and has around 128-512 bytes of storage.

Shifting and rippling

The block and position selector need to be prepared to read the intended data from the cartridge. Each requires a different approach. The block selection is performed by bit-shifting the right block number into the shift register. The position within the block is prepared by performing strobes, which in turn requires dummy reading, so the ripple counter (automatically) increases to the right position in the block.

The shift register is like a conveyor belt of bits. You need to place a bit on the belt, advance the belt one position and place the next bit. After performing this 8 times you are certain to have the right block selected. There are two registers involved in performing this bit shifting: $FD8B (IODAT) and $FD87 (SYSCTL1).

image

IODAT is a hardware register that has a bit for the data to be placed into the shift register. Bit 1 of this address is the Cart Address Data output bit. IODAT is a weird register, in that it can be read from and written to, but the individual bits provide either input or output access, so will only make sense when used appropriately You can control what direction (input or output) it has by setting it in the IODIR ($FD8A) register. It is sort of similar to the way the MAPCTL register determines whether to use RAM or hardware registers based on the bits you set. You need to set the direction for bit 1 of IODAT to 1 for output. After that, by setting that same bit 1 of IODAT you determine what is the next bit on the shift register for the cartridge address. The shift register is advanced by strobing bit 0 (called CartAddressStrobe) of SYSCTL1. A strobe means that the value of the bit changes from 0 to 1 and back to zero again. Although the shifter will except the data at the rise of the strobe bit, it must be set back to 0, as the high level of the bit is used to reset the ripple counter.

Here’s the general flow:

  1. Turn on cartridge power
  2. Set directions on IODIR
  3. Set bit 1 of IODAT to value of current address part (in order from A19 to A12)
  4. Strobe bit 0 of SYSCTL1 (write 1 then 0, assuming you start with a zero value)
  5. Repeat from 3 until all 8 bits have been set.

In assembler code this looks like the following:

lynxblock:
	pha
	phx
	phy
	lda __iodat
	and #$fc
	tay
	ora #2
	tax
	lda _FileCurrBlock
	inc _FileCurrBlock
	sec
	bra @2
@0:	bcc @1
	stx IODAT
	clc
@1:	inx
	stx SYSCTL1
	dex
@2:	stx SYSCTL1
	rol
	sty IODAT
	bne @0
	lda __iodat
	sta IODAT
	stz _FileBlockByte
	lda #<($100-(>__BLOCKSIZE__))
	sta _FileBlockByte+1
	ply
	plx
	pla

The code above is from the CC65 implementation for selecting a block, actually. It shows a couple of things when we forget about the details and the optimizations.

  • Notice the two calls to SYSCTL1 for strobing bit 0 to advance the shift register.
  • The accumulator holds the block number to select. The rotation of the accumulator is moving the next (highest) bit in the carry flag. The first call to IODAT stores the value 1 in the CartAddressData bit, if that carry flag was set. The second call is used to always put a 0 in as the default, allowing the first call to be skipped for a zero bit.
  • At the end of the routine the remaing number of bytes in the block is set. We need this to determine the block edge transitions later on.
  • The __iodat value is used to get the current value of the IODAT register. Two shadow variables are declared to hold the values that where written to the registers IODAT and IODIR, conveniently called __iodat and __iodir (with double underscores). The shadow values are needed, because we can never read the values back from the registers, but might want to inspect them later on.

For completeness sake it is worth mentioning that in the startup code of any CC65 compiled program the IODAT and IODIR get initialized. They are set to $1B and $1A respectively. You can find the fragment for the shadow variables in crt0.s:

ldx     #$1b
stx     __iodat
dex;  $1A
stx     __iodir

The initialization of the actual registers IODAT and IODIR is done using a longer list of initialization values for Mikey. This is also in crt0.s (around line 40):

MikeyInitReg:  .byte $00, …, $50, $8a, $8b, $8c, $92, $93
MikeyInitData: .byte $9e, …, $ff, $1a, $1b, $04, $0d, $29

After having prepared the shift register it is a matter of reading data from the $FCB2 (RCART0) address. The $FCB3 (RCART1) address is used for reading from the second bank that might be present. Usually there is only one bank (bank0) on a cartridge. When reading from RCART0 the strobe CART0/ is used and it will advance the ripple counter to the next value, essentially autoincrementing the cart’s current address.

Reading from cartridges

Let’s assume that the data we want to read from the cartridge is located somewhere like this:

image

The picture shows the data starting in the middle of the second block and continuing into the fifthblock. The way to read this data is to advance the high part of the cart address to the second block, then dummy read until the starting point of the data is reached. A thing to remember is that the blocks have a certain size. This is relevant for two reasons.

  1. You need it in calculations of the desired block if all you have is the consecutive byte number. The ripple counter is automatically set to zero by changing the high part of the cartridge address (because this requires using the strobe for the bit shifter which resets the counter). This also means that you need to do the correct number of dummy reads, also determined by the blocksize.
  2. When crossing the boundary of a block you need to increase the block number to read the right data. If you do not do that, you will read data from the same block again. That’s why you need to keep track of how many bytes are left in the current block. When it reaches zero you must increase the block number by shifting in 8 bits again.

We could do this using assembler, but that has been done. There is also a higher level abstraction from C. The methods lseek and read will do what we want, …. sort of. This is what the functions look like (from the unistd.h include file):

int __fastcall__ read(int fd, void* buf, unsigned count);
off_t __fastcall__ lseek(int fd, off_t offset, int whence);

The lseek method takes three parameters, of which only one is relevant. The file descriptor fd is always 1 for the Lynx, and for whence only SEEK_SET is supported. That leaves the offset you want to have into the cartridge. The offset is passed using a type off_t, but it is in fact an long integer that can hold the large zero-based offset from the beginning of the cartridge. In a 2MB cartridge this might actually be 2^21-1 = 2097151.

lseek will set the shift register to the correct value and advance the ripple counter to the start of your data, both depending on your block size. It supports the 512, 1024 and 2048 byte block sizes.

off_t offset = 0;
lseek(1, offset, SEEK_SET);

You might need to calculate the offset from the block number and offset in the block. That’s kind of silly, because the lseek implementation does the reverse. It’s just the way lseek is defined.

You need to include the headers stdio.h (for SEEK_SET constant), unistd.h (for the function prototypes of lseek and read) and sys\types.h (for the off_t type).

The actual reading is performed by calling read.

unsigned char buffer[256];
read(1, &buffer, 256);

You need to have a buffer that is going to hold the data read from the cartridge. The example above shows a 256 byte buffer and reads a 256 bytes sized chunk from the cartridge into it.

Here is an example of how to read the contents of your cartridge and dump it to the screen in multiple pages:

void dump_cartridge()
{
  off_t offset = 0;
  unsigned char buffer[80];
  unsigned char byte;
  char text[4];
  unsigned char page = 0, index = 0, x = 0, y = 0;


// Advance current address to start of cartridge lseek(1, offset, SEEK_SET); do { // Read 80 bytes for one page into buffer read(1, &buffer, 80); index = 0; tgi_clear(); // Draw all values for (y = 0; y < 10; y++) { for (x = 0; x < 8; x++) { itoa(buffer[index++], text, 16); tgi_outtextxy(x * 20, y * 10, text); } } tgi_updatedisplay(); wait_joystick(); } while (++page < 3); }

This example uses a loop that will read 80 bytes for a page into the buffer, then display them in hexadecimal value 8 byte per line for a total of 10 lines. Pressing any normal button on the Lynx advances the current page.

image

You should try this: open the LNX file using the Binary Editor (right-click your tutorial-cartridge.lnx file, then select Open With…) and compare the contents you see with what is displayed. Mind you, you have to skip 64 bytes of the LNX files. We’ll explain later why that is.

The example is included in the sample source code. It can easily be expanded to allow you to select the current block number and start from there.

Next time

This part showed how the cartridge system works at a low level and with the CC65 methods lseek and read. The Lynx cartridge can also use a simple file system with a directory and file entries. The next part will we look at how files can be read, and how this relates to the segments we saw in memory segments. Till then.

Posted in Tutorial | 2 Comments

Programming tutorial: Part 15–Memory and segments

Atari Lynx programming tutorial series:

In part 12 we covered memory in the Lynx for the first time. By now you may have run into memory limitations while building your Lynx games. Admitted, 64KB of RAM is not really much, especially considering that considerable amounts of the memory are required for the Lynx hardware.

Atari Lynx memory layout

The layout of the Lynx’s memory varies over time. At startup, there is nothing really loaded and the memory layout resembles this:


The green areas that are required by the hardware meaning both Suzy and Mikey. The Mikey 65SC02 requires zero page memory and a stack at $0000 and $0100 respectively. It’s yours to use, but only through zero page addressing and variables, plus by Push and Pull instructions that manipulate the stack. You cannot use these 512 bytes for any other purpose. This is the main reason that most programs get loaded at $0200, which is exactly after the stack’s memory. We already saw that $FC00 to $FCFF contains memory mapped registers for Suzy, and $FD00 to $FDFF likewise for Mikey. $FE00 to $FFF7 is the boot ROM area that is used for booting the Lynx.

After initialization of the Lynx hardware and the C and TGI libraries from the CC65 toolset the memory looks like this:


As you can see in orange, there are three new memory areas:

  • C stack
    The C library uses a stack of its own. This stack is usually 2KB large and is needed for more complex pieces of code (e.g. recursive functions).
  • Video buffers
    The TGI library initializes the video driver and will automatically allocate two buffer for video and do double buffering to avoid screen tearing and other weird effects during updates. One buffer requires 160*102 = 16320 pixels. Since each pixel can hold 16 colors and requires only 4 bits to hold the pen index, the actual number of bytes is 8160 (or $1FE0 in hex). With two required buffers, that’s quite a lot of memory.

Excessive direct access

One thing may have struck you as odd: how come we can use the area from $FC00 to $FDFF where Suzy and Mikey’s hardware mapped registers are. Wouldn’t there be a conflict between the registers and the RAM address space? Would we have to do a lot of memory mapping tricks to make it work? Luckily, we do not have to reserve that area and no mapping of the memory is required. That would be way too complicated. No, the good thing is that the LCD panel gets its data directly from RAM memory… always. This feature is called DMA for Direct Memory Access. So, the video display will always read RAM, no matter where the video buffers are located.


That’s why it is better to overlay it on top of an area that is otherwise less (easily) useable. Hence, the Suzy and Mikey address spaces. We can simply leave it at the regular hardware space, so we can access the special registers. The RAM access is not needed, unless we want to draw directly into the video buffers. That is pretty unlikely.

The same will hold true for the collision buffer, should you want to use that. It will take another 8160 bytes and can be located anywhere. You probably want to lay it right before the first video buffer. That’s where TGI will place it if you use the tgi_setcollisionbuffer(1); call.

With the C-stack and the video buffers in place you are around 18 KB poorer in memory, 26 KB for a collision buffer as well. The bottom line is that in most cases (no collision detection) you can spend your memory from $0200 to $0B838.

Configure my memory

Let’s take a look at how this translates to the CC65 suite. The programs and games we write consist of C and assembler code. The cc65.exe compiles the C code and generates assembler code from it. The assembler code (from C and your own) gets assembled by ca65.exe. We end up with a couple of object modules that need to be tied together by the linker. The object modules do not have exact addresses for memory just yet. It uses placeholders to be flexible in the actual allocation in memory. The linker ld65.exe performs the connection of the modules and the choice of final memory locations based on the configuration of your memory areas as indicated in a configuration file.

Each specific area in memory has a few characteristics:

  • Start address
    The area is located from a specific address up in memory space. As an example, take the video buffer that has its start address at $C038.
  • Area size
    Each area is of a particular size. Sticking with the same example, the video buffers are both 8160 bytes in size.
  • Type of memory
    Some memory areas are read or write or read/write. In the Lynx we mostly deal with read/write memory, because everything memory is located in the 64KB of RAM. Other systems have memory mapped cartridges that are ROM, i.e. read-only.

The linker uses configuration files to tie the individual parts of your program or game together. A configuration file holds information on the memory area and segments. The ld65 linker has built-in configurations for each of the known targets. The lynx has 4 built-in configurations:

  1. lynx
    The default configuration that will have the MEMORY section like above. It adds a small boot loader and a required directory, plus a LNX header so it can run in Handy. The ROM image without the LNX header can be burned to an EEPROM or Flashcard and will produce a working cartridge.
  2. lynx-bll
    This configuration creates a BLL header to the output file, so it can be uploaded via a PC to ComLynx cable using any of the cartridges that allow BLL uploads (e.g. SIMIS and Championship Rally).
  3. lynx-coll
    Essentially the same configuration as the default one. It claims an additional $1FE0 of memory for the collision buffer, before the first video buffer.
  4. lynx-uploader
    This configuration adds a special uploader area right before the first video buffer. The useable RAM area is reduced by a full $100 (supposedly because of alignment?). I believe this configuration file does not function correctly.

Shown below is a fragment of the default configuration that is used by the linker ld65.exe for the lynx target in case you did not specify your own configuration.

MEMORY {
  ZP:     file = “”, define = yes, start = $0000, size = $0100;
  HEADER: file = %O,               start = $0000, size = $0040;
  BOOT:   file = %O, start = $0200, size = __STARTOFDIRECTORY__;
  DIR:    file = %O,               start = $0000, size = 8;
  RAM:    file = %O, define = yes, start = $0200,
          size = $BE38 – __STACKSIZE__;
}

You can see what a particular configuration is by running:

ld65.exe –dump-config lynx

where the bold item is the configuration name. The source file for the default lynx configuration is called lynx.cfg. You can find it in your CC65 folder under the source code for ld65, presumably C:\Program Files\CC65\src\ld65\cfg. This configuration is compiled as part of ld65.exe, so changing the file has no effect. The other three configuration files are located in the same folder.

Focus on the bold items in the MEMORY section for now. You should be able to recognize some of the numbers. Zero page (ZP) runs from $0000 to $00FF, for a total size of $0100. The user available RAM area starts at $0200 as we saw earlier and runs until $B837. The size is computed as follows:

VIDEOBUFFER1_START – STACKSIZE – RAM_START
= $C038-$0200-STACKSIZE = $BE38 – $0800 = $B638 (46648 bytes)

The default configuration file does not give you the origin of the numbers, just the correct (resulting) ones. The constants come from the symbols section of the same configuration file:

SYMBOLS {
  __STACKSIZE__: type = weak, value = $0800; # 2k stack
  __STARTOFDIRECTORY__: type = weak, value = $00CB;
  __BLOCKSIZE__: type = weak, value = 1024; # cart block size
  __EXEHDR__:    type = import;
  __BOOTLDR__:   type = import;
  __DEFDIR__:    type = import;
}

The stack size and start of directory are defined constants in the symbols section. These values can be used to define your memory areas and make them more flexible and less hardcoded. You could change the C stack size and make it bigger or smaller for your needs. All it takes is adjusting the value attribute of the __STACKSIZE__ symbol.

There are a couple of things in the memory and symbols section that do not make sense right now. We will get to them in time. For now, suffice to say that HEADER, BOOT and DIR are areas for respectively the Handy emulator’s LNX file header, the encrypted boot loader on the cartridge and the directory with file entries on the cartridge.

Define it for me

Notice how some of the memory areas use an attribute called define with a value of yes and no. Each memory area that has a define=yes will make the linker emit two values. For an area called AREA51 it will emit __AREA51_START__ and __AREA51_SIZE__ corresponding to the start address and size of the memory.

Other pieces of code may rely on these values to allow for a flexible layout of memory and the code that is tied to the memory layout. An example is the implementation of the C stack that depends on the location and size of the RAM memory area. We already saw that the __STACKSIZE__ is a constant in the symbols. But the implementation also relies on the final physical location. It uses the value of __RAM_START__ to indicate the start address. Later, the linker will emit this value because a memory area called RAM is defined. When linking together, all puzzle pieces fit together.

In a while you will see how the linker emitted values for these defines on memory areas can be very useful. For one, they allow you to create the file entries in the directory structure that each larger game cartridge will have.

Dividing in segments

The source code items you create get compiled, assembled and assigned to the memory areas. There’s another dimension to all this. The source code consists of elements such as executable code, variables and static data, that have a different behavior and memory requirements. Similar elements are combined and group together into memory segments of a certain type. In general this allows for the protection of memory and programs residing in it. There are four segment types in C source code:

  1. Code
    Regular executable code. Normally, this cannot be altered and it resides in a read-only memory segment. If code is self-modifying it cannot reside in this segment.
  2. Data
    Refers to data that can be altered. This data comes from the global and static variables that you declare and initialize with values. These values are combined in the data segment. They can be found in the object module as binary values that get copied by the loader at the memory location of the variables, initializing them to the values you gave them. After that they can be altered, because the memory is for variables (after all).
  3. Read-only data
    Data, but this is not meant to be altered. It is reference data from constant valued variables (marked as const). Some examples of read-only data are binary data for images and music, and text strings containing messages.
  4. Bss (Block Start by Symbol)
    This is data that is uninitialized. It will have a zero or null value. There is no need for the object module to contain this data, just the location in memory. A simple routine can initialize the values, because it will be zero anyway.

That’s a lot of theory, so a real example with code might illustrate this a bit. Take the following code sample:

unsigned char a;
char b = 42;
const char text[] = “Hello, World!”;

void example()
{
  int x, y = 1337;
}

The compiler will generate the following assembler code for this (showing relevant fragments):

.segment    “DATA”
_b :
  .byte  $2A

.segment    “RODATA”
_text :
  .byte  $48, $65, $6C, $6C, $6F, $2C, $20, $57, $6F, $72, $6C, $64, $21, $00

.segment    “BSS”
_a : .res    1, $00

; ———————————————————– -
; void __near__ example(void)
; ———————————————————– -

.segment    “CODE”
.proc    _example : near

.segment    “BSS”
L0035 :
  .res    2, $00
L0036 :
  .res    2, $00

.segment    “CODE”
;
; int x, y = 1337;
;
  ldx     #$05
  lda     #$39
  sta     L0036
  stx     L0036 + 1
;
; }
;
  rts

.endproc

Hopefully you can make some sense of the transition from C to 6502 assembler. Notice how the segments for the various types of code and variables is declared. It uses the .segment keyword combined with the quoted segment name. Since b was initialized it is placed in the data segment with its initialization value. Likewise, a was not initialized and can reside in the bss segment. The constant value for text is listed as the hex ASCII values in the read-only data segment. The code segment is used for the implementation of example.

What may come as a surprise is that the initializer for y inside the example function is placed in the bss segment just like x. The reason is that y needs to be initialized every call to example(), so it is not sufficient to have the value 1337 in the data segment. Instead it is placed in the method itself and y is simply placed in bss, to save size in the binary image for the object module.

Choosing your segments

The names for the segments we just saw might seem arbitrary, but nothing is further from the truth. You chose them when you compiled your C source code. You don’t remember? Well, that’s because we never really discussed this. I will take you back to one of the first tutorials where we looked at the MAKE files for our projects. Here’s an excerpt from the lynxcc65.mak file:

CODE_SEGMENT=CODE
DATA_SEGMENT=DATA
RODATA_SEGMENT=RODATA
BSS_SEGMENT=BSS

SEGMENTS=–code-name $(CODE_SEGMENT) \
  –rodata-name $(RODATA_SEGMENT) \
  –bss-name $(BSS_SEGMENT) \
  –data-name $(DATA_SEGMENT)

# Rule for making a *.o file out of a *.c file
.c.o:
  $(CC) -o $(*).s $(SEGMENTS) $(CFLAGS) $<
  $(AS) -o $@ $(AFLAGS) $(*).s

The MAKE file defined some macros for the 4 segments and gave them the names of CODE, DATA, RODATA and BSS. It might have been anything you liked, although changing this will force some adjustments in other places as well. The inference rule for .o files for object modules from C source code shows that the cc65.exe compiler takes the arguments –code-name, –rodata-name, –bss-name and –data-name to define the segments names used in the compilation to assembler code. This will make the compiler emit the .segment “DATA” and similar pieces of assembler code we saw in the earlier fragment.

Every time you call the compiler cc65 you are free to pass different segment names. This allows you to choose your segment names for all C files that are compiled by that single command. As the SEGMENTS macro is like a global variable and the inference rule will apply to all times the rule is triggered, it is a bit fairer to say that it applies to every C file that is affected by your make file and thus your entire project as it currently is organized.

If you want a more fine grained control over the segments you have a few options:

1. Create more MAKE files

Each MAKE file will hold its own SEGMENTS redefinition. The lynxcc65.mak file is included by every MAKE file, and defines the SEGMENTS macro first. If you add your own (re)definition in your MAKE file (say fonts.mak), it will overrule the previous definition with your new one. A separate MAKE file can be triggered by calling:

cd fonts && $(MAKE) $* /f fonts.mak

assuming you have placed the items build by the fonts.mak MAKE file into a relative subfolder fonts.
The fonts.mak file should hold a new definition like this:

SEGMENTS=–code-name FONTS_CODE \
  –rodata-name FONTS_RODATA \
  –bss-name FONTS_BSS \
  –data-name FONTS_DATA

2. Include #pragma statements in your C source code

The cc65.exe compiler recognizes the following #pragma statements: Adding this at the top of your C file will make sure that all code inside that C file is compiled into the specified segment names. It could be used midway through the code, but that would mean that some code gets compiled into the default SEGMENTS defined segment names, and the rest in the #pragma ones.

#pragma data-name (“FONTS_DATA”)
#pragma rodata-name (“FONTS_RODATA”)
#pragma code-name (“FONTS_CODE”)
#pragma bss-name (“FONTS_BSS”)

It is even possible to push the current (old) name for a segment onto a sort of stack with the push keyword. It seems unlikely that you will need this control any time soon.

So far we discussed how you can control the segments for C code. In case you are writing assembler code yourself, you will need to specify the segments for the various types of code and variables yourself. You will use the .segment keyword, just like in the compiler generated assembler code, to do so. As a matter of fact, you were already using it without knowing it.

# Rule fore making a *.o file out of a *.bmp file
.bmp.o:
  $(SPRPCK) -t6 -p2 $<
  $(ECHO).global _$(*B) > $*.s
  $(ECHO).segment “$(RODATA_SEGMENT)” >> $*.s
  $(ECHO) _$(*B) : .incbin “$*.spr” >> $*.s
  $(AS) -t lynx -o $@ $(AFLAGS) $*.s
  $(RM) $*.s
  $(RM) $*.pal
  $(RM) $*.spr

When a bitmap file was used to create the read-only SCB data for a sprite, it used an inference rule that generates a new assembler file containing the line

.segment “RODATA”

or whatever the read-only data segment is called by the RODATA_SEGMENT macro at that point in time. For example, when we did the robots.bmp file this gave the following robots.s assembler file:

.global _robot
.segment “RODATA”
_robot:
  .incbin “robot.spr”

It might require a different inference rule or redefinition of RODATA_SEGMENT to place your sprite data in an other segment.

Segments and areas

At this point you are probably wondering what all these segments and memory areas are all about. And maybe even how the two are related like I hinted at when I mentioned another dimension to memory. Get ready for it, here it comes.

Individual segments are assigned to a memory area. As memory areas can hold various types of memory, like read/write for RAM or read-only for ROM, certain segments should go into compatible memory areas. E.g., code segments can come from ROM, but data should always be assigned to RAM, as it requires read/write memory.

Typically (for the Lynx) related segments are assigned to the same memory area. The Lynx only has RAM memory to work with. Admitted, the cartridges are like ROM, but it is accessed as a sequential stream that needs to be copied into RAM before it can be used.

The linker can work its magic for each of the segments that are assigned to memory areas. It can allow segments to be assigned to overlapping memory areas. This way we can have code and data in the same memory space at different times. By loading the required code and data at the appropriate time it will enable us to fit more code into our already constrained memory space.

The linker configuration file has a section for the segments and their mapping to memory areas. It tells the linker what type of code or data is in each segment and where to load the segment into memory. Here is a fragment from the default lynx configuration:

SEGMENTS {
  EXEHDR: load = HEADER, type = ro;
  BOOTLDR: load = BOOT, type = ro;
  DIRECTORY:load = DIR, type = ro;
  STARTUP: load = RAM, type = ro, define = yes;
  LOWCODE: load = RAM, type = ro, optional = yes;
  INIT: load = RAM, type = ro, define = yes, optional = yes;
  CODE: load = RAM, type = ro, define = yes;
  RODATA: load = RAM, type = ro, define = yes;
  DATA: load = RAM, type = rw, define = yes;
  BSS: load = RAM, type = bss, define = yes;
  ZEROPAGE: load = ZP, type = zp;

  EXTZP: load = ZP, type = zp, optional = yes;
  APPZP: load = ZP, type = zp, optional = yes;
}

The bolded items are the segments we have encountered so far. CODE and RODATA are read-only segments as indicated by the type=ro attribute. DATA is read-write and BSS is of type bss, like you would expect. Each of these segments gets loaded into the RAM memory area. That much makes sense, as the RAM segment is currently the only user memory,

There are a few other segments (ZEROPAGE, EXTZP, APPZP) defined that are used by the compiler for zero page variables. The segments at the top EXEHDR, BOOTLDR and DIRECTORY are for creating a binary image that you can run in Handy. The STARTUP, LOWCODE and INIT segments are for the C runtime to put stuff that needs to be in potentially special areas. Consider these a given for now.

Also, notice the fact that some segments are marked as optional, where others have the define=yes attribute. The former means that the segment might not actually be there and some optimizations can be done. The latter will make the linker emit values for the section, similar to what it does for define=yes in memory areas. For a segment named FONTS_DATA the linker creates values __FONTS_DATA_SIZE__ and for FONTS_CODE two values __FONTS_CODE_LOAD__ and __FONTS_CODE_SIZE__. The values will come in useful at a later time.

Some rules of engagement

You might think about renaming some of the areas and segments. Be careful though, because some things simply need to be present and named according to presets. As an example, the C runtime library depends on the RAM memory area to be present. It assumes that the C stack is located directly after the RAM area. It uses the generated values for __RAM_START__ and __RAM_SIZE__.

Next time

This was a pretty deep and theoretical part in the tutorial. It covered a lot of ground that was more computer science related and less specific for the Lynx. Nevertheless, it was necessary to tackle this, because a lot of other Lynx and CC65 specifics are related to it either directly or indirectly. Next time we will continue our investigation of segments and look into loading code and data into memory from cartridges. Till then.

Posted in Tutorial | Leave a comment

Programming tutorial: Part 14–Timers

Atari Lynx programming tutorial series:

In the last part of the tutorial we looked at how the Lynx console uses UART and how the hardware behaves. Before we dig deeper into ComLynx and programming for it, we need to take a little detour to investigate timers. In this part we will cover the basics of timers, the hardware, how they work and get you started programming the timers.

Lynx and timers

The Atari Lynx has a customized 65SC02 processor called Mikey. One of the customizations is the addition of a set of timers. The Lynx has 12 timers inside of Mikey: 8 of these are “normal” timers and the other four are audio channels, which behave like timers but generate audio. We will look at the 8 regular timers first and are going to cover audio and channels in a later tutorial part.

First, what are the timers in the Lynx? There are a number of possible meanings to the word timer. The Lynx has countdown-timers, meaning they count down to zero. They have some characteristics and specific behavior.

The short story on timers

An activated timer counts down from a start value to zero at a specific pace. Once it reaches zero it will underflow and optionally cause an interrupt (IRQ) with the timer’s flag set in the interrupt status byte (available through INTSET or INTRST). Also, it might reload to counter to a backup value and continue counting down again.

And the long story with pretty pictures

A timer ticks down to zero at a certain frequency. It does so by reducing its counter value at the end of every time interval. That interval is called the source period. The timer keeps its current value for the length of the interval, before dropping by –1 (minus 1) at the end.

image

When the timer has reached zero, it is said to “expire” or “timeout”. It will expire at the end of the period. This means that when a timer starts counting down from 5, it will expire after 6 (not 5) periods of time. An expiring timer might trigger an IRQ and might reload. Both of these depend on the settings of the timer.

If a timer has reloading enabled, the value of the timer will change to the backup value (aka reload value) of the timer after it expires. The behavior of a reloading timer would look like this:

image

Note that the start value of a timer does not have to be the same as the reload value, as depictured above. I intentionally had it start at 3, instead of the reload value 5.

The timers all behave the same, with only a very small number of exceptions and special purposes for some of them. They share the following properties:

  • Count enable
    A timer can be turned on (enabled) or off (disabled). Only when it is enabled will it count down.
  • Source period
    The timer counts down one tick at a time. One tick takes an amount of time that is called the “source period”. The source period ranges from 1, 2, 4, 8, 16, 32 to 64 µs (microseconds).
  • Current count
    The timer has a current value or count that indicates how many periods are left for the timer to reach zero.
  • Reload enable
    When the reloading is enabled, the timer will reload once it reaches zero. Reloading means it will get a new current value higher than zero.
  • Backup (or reload) value
    A timer that reached zero it will reload its counter to the backup value provided reloading is enabled. The backup value must be higher than zero for the timer to count at all.
  • Timer done flag
    Once a timer has reached zero, it is done. The timer will remember that it is done, even when it is set to reload, by flagging a bit called Timer Done. It is possible to clear this flag.
    Important: an active timer that has the Timer Done flag set will not count down, unless it has reloading enabled. 
  • Interrupt enable
    By enabling interrupts, the timer will cause an IRQ when it underflows. Otherwise, the timer will simply expire, flag it is done and reload (if reloading is enabled), then continue counting.

More than one timer

The 8 timers of the Lynx are numbered from 0 to 7: timer 0, timer 1, all the way up to timer 7. Timers 0, 2 and 4 are special. Timer 0 and 2 are related to video and correspond to the dimensions and refresh rate of the LCD screen. Timer 4 is the baud rate generator of the UART, like we discussed in a previous part. The other timers are yours to use.

The timers can be used stand-alone, or linked together. The first speaks for itself. A standalone timer is a timer with its own properties and completely self-contained in its behavior. However, a linked timer will not have a source period defined in microseconds, but depends on the timer to which it is linked to count it down.

image

The picture shows how timer 3 is linked to timer 1. Whenever timer 1 expires it “ticks” the linked timer, number 3 in this case. It’s kind of like a countdown stopwatch. Imagine that timer 1 corresponds to seconds and timer 3 to minutes. Whenever the seconds timer 1 reaches zero it will cause the minutes timer 3 to reduce by 1.

image

Multiple timers can be linked in a chain according to the linking order. The linking order of the timers is:image
This order is fixed, so timer 3 can be linked to 1 (ie. timer 1 ticks timer 3), but to none of the other timers. Timer 7 ticks audio channel 0, which is a special kind of timer. Audio channel 1 links to 2, and 2 to 3. Audio channel 3 ticks timer 1. The other chain is timer 0 to 2 to 4.

It is important to remember that each of these timers can be linked, but don’t have to be and usually are not. Except for timer 0 and 2 as you will see next.

Video timers

The special timers 0 and 2 deserve a bit of extra explanation. These video timers should not be touched by you. They are initialized by the boot rom code and set to specific values. They are set up to give some additional help during the drawing of the screen and timing your code. The timers both have interrupts and reloading enabled. Again, do not change their settings! You have been warned.

Timer 2 corresponds to the frequency of screen refreshes. Once every screen refresh it will expire and generates an interrupt that usually goes by the name of the vertical blank (VBL) interrupt. It has a backup value 104 for 102 horizontal LCD lines plus 3 for vertical blank time (also referred to as the overscan on some other consoles). Timer 2 is set up to link to timer 0.

Now here comes some math. Take the regular refresh rate of 60 Hz. That’s 1/60 * 1000000 = 0,016667 seconds or 16667 us  per screen or also the “time” that timer 2 should take from reload to expired zero. For a screen that has 102 real + 3 virtual display lines, it means that the time per tick should be 16667/105 = 158.7 microseconds. That’s the time that timer 0 needs to expire. Given a source period of 1 µs (this is how it is set by the boot rom code) we can deduce that the reload value of timer 0 should be 158. That’s indeed what it is set to.

There’s another Magic “P” value that is somehow related to this. The Epyx specification mentions a formula that takes the time a line needs to expire and computes this P value. It is important in the electronics of the hardware somewhere.

clip_image002[7]

For 60Hz the P value is known to be 41 (0x29, again from the Epyx documentation). With the inverse of the function

clip_image002[9]

linetime turns out to be (41+1)/4*15+0.5 = 158 µs. That brought us right back to the expiry time of timer 0. Sounds reasonable.

Hardware registers for timers

The properties of a timer are influenced by 4 hardware registers:

  1. TIMxBKUP
    The backup (reload) value of the timer. Whether this is used depends on the reloading setting of the timer (see CTLA).
  2. TIMxCTLA
    The static control byte of the timer, which I’ll refer to as CTLA from here on. This enables or disables the timer, reloading and interrupt, plus it has the source period selector.
  3. TIMxCNT
    The current value of the counter of the clock.
  4. TIMxCTLB
    This is the dynamic control byte. It has 4 bits that indicate the state of the timer. The most important one is the Timer Done bit.

In this list the x denotes each of the timers. Each timer has these for bytes. E.g., timer 3 has TIM3BKUP, TIM3CTLA, TIM3CNT and TIM3CTLB.

The location of the hardware registers starts at $FD00 and continues to $FD1F, in groups of four consecutive bytes (BKUP, CTLA, CNT and CTLB) per timer. So, $FD00 to $FD03 for timer 0’s backup, static control, current count and dynamic control, then $FD04 to $FD07 for timer 1 all the way to $FD1C – $FD1F for timer 7’s bytes.

The backup value and the current value are full 8-bit values. They range from 0 to 255 as an unsigned byte and do not really deserve much explanation. Both can be written to and read from. By writing to the backup value you set the counter for the timer upon reload. It usually does not have an immediate effect. Writing a byte to the count byte will immediately change the current value. It might be a good idea to disable a timer first before writing a new value into count.

Static and dynamic control

The other two bytes are more complicated. Both are composed of individual bits that have a specific meaning.

The static control has three Enable bits: for the timer itself, the reloading and the interrupt. When the value of the particular bit is a 1 (one) it is enabled. For 0 (zero) the specific function or behavior is disabled.

One bit is used to indicate that the Timer Done bit should be reset to zero. It is a write-only bit and when written to will clear the Timer Done bit in the dynamic control byte (more on the dynamic control bits below).

image

The bits 0-2 are used to select the source period of the timer. This table will help you find the right bits for your needs:

Bits Value Description
000 0 1 µs (microsecond)
001 1 2 µs
010 2 4 µs
011 3 8 µs
100 4 16 µs
101 5 32 µs
110 6 64 µs
111 7 Linking (linked to previous timer in link order)

Here’s an example of a particular static control value: writing 0x98 to TIM1CTLA. That is 0%10011000 in binary. You can see bits 7, 4 and 3 are set. Looking at the meaning of the bits, this means timer 1 is enabled, it reloads and fires interrupts. The source period bits are 000, so that’s a 1 µs interval time for timer 1.

Another one: writing 0x4A (or 0%01001010 binary) to TIM5CTLA. This means that the Timer Done bit will be reset for the timer 5, and it is started at a 2 microsecond source period. It will not reload or fire an interrupt when it expires. For a count value of 199 the timer will expire after 400 microseconds.

Then there is the dynamic control. It has the four lower bits that reflect the state of the timer dynamically. You typically do not write to dynamic control, but read from it. There’s one important bit in dynamic control that has a known function. It is the fourth bit (bit 3) that tells whether the timer has ever timed out (expired). You can inspect the individual bits with code like this:

MIKEY.timer5.control2 & 0x08) == 0x08

The other three bits are Last Clock, Borrow-in and Borrow-out. The function of these bits are unknown to me. I do know that it is not emulated correctly in Handy or any of its derived emulators. Last Clock has frequently changing values at a rate comparable to the source period of the timer. The two borrow bits have a function that I couldn’t figure out yet. If anyone knows, feel free to comment. The bottom line is you probably only need the Timer Done bit anyway.

Yooh, Mikey! Program the timers already

Alright, we know enough now to do some programming. The first thing will be a little piece of code that creates a timer that will count down from 100 to zero. The include file _mikey.h has various handy definitions related to the Mikey hardware registers. These have been captured in a struct that reflects the layout of the Mikey address space (see the tutorial part on memory mapping) and its hardware registers. It also holds the structs for the timers:

/* Mikey structure definition */
typedef struct
_mikey_timer {
  unsigned char reload;
  unsigned char control;
  unsigned char count;
  unsigned char control2;
} _mikey_timer;

This has the exact layout of the hardware registers per timer we discussed a moment ago. The only difference  is that CTLA and CTLB are named control and control2.

The struct for Mikey has the 8 timers starting from $FD00 like so:

struct __mikey {
  struct _mikey_timer timer0;       // 0xFD00
  struct _mikey_timer timer1;       // 0xFD04
  struct _mikey_timer timer2;       // 0xFD08
  struct _mikey_timer timer3;       // 0xFD0C
  struct _mikey_timer timer4;       // 0xFD10
  struct _mikey_timer timer5;       // 0xFD14
  struct _mikey_timer timer6;       // 0xFD18
  struct _mikey_timer timer7;       // 0xFD1C
  …
};

And finally, the include file lynx.h has this defined:

/* Define Hardware */
#include <_mikey.h>
#define MIKEY (*(struct __mikey *)0xFD00)

Essentially this creates an overlay of a struct over the hardware memory addresses, so they get convenient names and an entry point called MIKEY. We can refer to the timer registers by using MIKEY.timerx and naming the property of the timer.

MIKEY.timer1.count = 100;
MIKEY.timer1.control = 0x0E;

That gives you a timer that will go from 100 to 0 and expires. Since reloading and interrupts are not enabled nothing will happen except that the Timer Done bit gets set in the dynamic control byte.

When you use a single timer in this way, you will find out that even at the slowest setting (64 µs) and the highest reload (255), the expiry time of a reloading timer is still fast (64 * 256 = 16384 µs = 0.016 seconds). To get a more realistic timer you will have to link timers or use the VBL (and its interrupt). We are going to investigate the latter method in another part of the series. Linking is something we can do right now.

MIKEY.timer1.control = 0x1E;
MIKEY.timer1.reload = 255;
MIKEY.timer1.count = 255;

MIKEY.timer3.control = 0x1F;
MIKEY.timer3.reload = 255;
MIKEY.timer3.count = 255;

With the setup above you have enabled timers 1 and 3 where timer 1 has a 64 µs source period and timer 3 is linked to timer 1. Both will count from 255 to zero, then reload to 255 again. In the draw routine of your program you can use the current count value:

char text[20];
itoa(MIKEY.timer1.control2, text, 16);
tgi_outtextxy(10, 0, text);
itoa(MIKEY.timer3.control2, text, 16);
tgi_outtextxy(20, 0, text);
itoa(MIKEY.timer5.control2, text, 16);
tgi_outtextxy(30, 0, text);

You can also read and dump the other control bytes to the screen and see how they behave. Here’s a screenshot of what is included in the sample code for this part.

image

You can look at the code at what this does and change it around to do some experiments. The line that says Timer5 done uses this piece of code:

tgi_outtextxy(95,70, (MIKEY.timer5.control2 & 0x08) > 0 ? “Yes” : “No”)

to mask out the Timer Done bit with the 0x08 (bit 3 of CTLB) .

A short remark on interrupts

In the Lynx interrupts are always (always) caused by timers. Keyboard and IO never generate them. The video related interrupt (HBL and VBL), plus the ComLynx interrupts for TX buffer ready and received char are generated by timers 0, 2 and 4. Each of these interrupts is enabled by setting bit 7 of the respective timer’s static control byte CTLA.

When we get to interrupts we will revisit timers and look how the interrupts are generated by them. It is probably the most relevant function of a timer, as timers keep ticking regardless of what code is executing. Interrupts fit nicely into the picture and give the timers a purpose and good use. Without interrupts timer might not be as useful.

Right now you can enable the interrupts, but cannot handle them without knowing how to program an interrupt in CC65 (and assembler code). We will get there, don’t worry.

Next time

The next tutorial part returns us to the ComLynx functionality. We will dive into the ComLynx driver and how it can be used to transfer data across Lynx consoles, from Lynx to PC and vice versa. We needed this detour to timers to understand how timer 4 is used and can be configured. Till next time.

Posted in Tutorial | Leave a comment

Creating a ComLynx to USB cable

The Atari Lynx consoles can be connected together with a ComLynx cable. I have written about this before and showed how they can also be chain-linked to connect up to 16 consoles. The commercial games only had support for up to 8 (Todd’s Adventures in Slime World, the only one with 8).

But, in the nineties Bastian Schick already developed a ComLynx to RS232 cable that allowed you to connect your Lynx to a PC with a COM port.

cl_232

Since I am no hardware or electronics expert I decided to go a different route: create a ComLynx to USB myself. That would also solve the problem of requiring a computer that still has a COM port. I know mine doesn’t have one. There has been talk about building a ComLynx to USB cable in the AtariAge Lynx forums before. GadgetUK managed to build one, as you can see in the pictures there. He also wrote a .NET program Zeus (with sources) that works excellently and allowed me to test-drive the cable.

Getting started

The Lynx uses UART to let the consoles to talk to each other. There is a lot of material available on the Internet. You could even read a part of my tutorial that brings you up to speed on the Lynx and UART.

Let’s take a look at what a ComLynx cable looks like:

There are two male and one female connector. You can read more about this at a previous blog post. Inside the ComLynx cable there are two wires. I never ever opened a ComLynx cable up. So, in the interest of science I cut open the cable at the end that has the single male. The idea was that I could still use the other end to link to more than one Lynx. Here’s the inside of the cable:

WP_002196 (2)

The cable shows two wires only. After some reading and measuring I came to this conclusion:

  1. Red: Corresponds to the +5V that the ComLynx uses for the high signal of the UART and is the combined receive (RX) and transmit (TX) signals
  2. White: Ground cable (GND)

Alright, that part was easy. The next part is to find some piece of electronics that can be used to connect to the PC via USB.

USB to UART hardware

Searching through eBay I selected the following USB to UART pieces of electronics that seemed to fit the bill: the required connectors (GND, RX, TX and optionally +5V) and cheap (ranging from ($2,60 to $8,05).

  1. Silicon Labs CP2102 USB connectivity bridge (driver)
  2. Prolific PL2303HX  (driver)
  3. FTDI FT232RL USB to TTL serial cable adapter (driver)

imageimageimage

I took the pictures from eBay to show what they are like. The first one is very bare, and uses a 6-pin connector. The second from Prolific has a casing and cable with a loose end. It’s the casing that is most interesting. Finally, the FTDI version has some neat connector-thingies at the end for each of the six loose wires.

Now, a thing to note is that the Prolific one. It turned out that the chipset that is used in this connector is sometimes a fake Chinese one, not the original Prolific. The latest drivers from Prolific will detect and reject the chipset. The result is a Windows device that is detected, but lacking driver support.

image

In the properties of the device error code 10 is shown.

image

Apparently the older drivers did not have this fake chipset detection and worked OK. The older driver that was referenced in the eBay auction might help. Some more info I found here.

I do not know if mine is a fake one. There is also no support for Windows 8 for the HXA model, whether it is fake or genuine. I should have read the description better.

Installing USB drivers

The Silicon Labs and FTDI USB devices both installed pretty smoothly. Once inserted in your USB slot Windows will detect it and (attempt to) install its own drivers first.

image

image

Windows is really helpful here and offers a link to the download location of the manufacturer.

image

The Prolific device does not have an appropriate Windows driver as part of the OS installation.

image

Windows 7/8 will find drivers for the FTDI one. However, these are not suitable. You will need to download the appropriate drivers from the manufacturer’s website. I’ve included the links to the drivers in the list above. Windows 8.1 has the drivers for FTDI out-of-the-box.

Running Windows 8 the registered FTDI device showed up as a FT232R, indicating that the driver is not available yet.

image

After that I installed the Virtual Com Port (VCP) driver from FTDI and the Silicon Labs driver. The end result is two properly registered USB to UART devices.

image

Building the physical wire

With the USB devices and driver troubles out of the way there is nothing holding us back in that respect. Let’s connect the ComLynx cable to the USB device.

I came up with two strategies:

  1. Connect/solder cable to the USB device
    Since I already opened a cable I might as well connect it to the USB device itself.
  2. Keep ComLynx cable and device intact
    This means that the cable will not be cut and the device will not require any cutting, soldering or whatever.

Soldering away

Going with number 1 first I looked at the back of the FTDI device after opening up the USB case. There I found that it neatly shows what each pin is used for.

TODO: New pictures

So, I soldered the red cable for combined RX and TX to the two pins that were indicated as RXD and TXD. The white ground wire connects to GND. The end result with the casing assembled again looks pretty swell.

WP_002201 (2)

When I did the same for the Prolific device, I couldn’t test that wire with my Windows 8.1 machine, because of the aforementioned incompatibility of the Prolific chipset.

I had already used the Silicon Labs version and that turned out to work alright as well. The end result wasn’t as pretty as the previous one, so I took it apart again and build the one shown above. When I rebuild it using the Silicon Labs device I will post new pictures here.

Inside the ComLynx connector

Since the other strategy would not allow me to mutilate the original ComLynx cable or solder at the device, I had to sacrifice another thing: the ComLynx connector. Fortunately I have several of those lying around from all the broken Lynx boards I acquired over the years.

The next picture shows the loose connector’s back, front and inside.

image 

It may be kind of hard to see, but the back shows four pins:

  • Left side: a single pin that corresponds to RX and TX
  • Top side: again, a single pin for GND
  • Right side: two pins, of which I do not know the function (anyone care to comment?)

Here’s what the ComLynx cable looks like inserted into the connector.

image

I used a simple solder board to align the top and left pin to two little connectors that can hold the individual wires of the FTDI device’s cable. The two pins at the right side (left facing the front) were bent outwards, so they wouldn’t connect or interfere with the rest.

imageimage

You can see how I soldered them to the board.

With that done I could finally insert the ComLynx cable into the connector. All that was left was to hook up the beautifully colored wires of the FTDI device to the little two connectors at the right of the ComLynx connector.

The color scheme for the FTDI cable (in my case) was like this:

Color Function
Black GND
Blue CTS
Red +5V
Green TXD
White RXD
Yellow RTS

That meant that the white and the green should be at the bottom row and the black cable at either one of the top pins.

image

Well, maybe not as neat as the previous one, but you don’t have to ruin a perfectly fine ComLynx cable nor solder the original wire. I might leave this as is or take the USB device apart to go for strategy 1 with it. I think it is more practical to have a single cable, instead of two separate cables and an open electronic board.

Test driving the cables

Aah, yes, the testing. At this point you will have to wait for my tutorial series to catch up. I did the first part on ComLynx already. The next one will show how to program the Lynx for UART and will make extensive use of the cable.

Or, you can ask for me to do a write-up before that. Feel free to ask any questions. Good luck building your own cable.

Posted in Hardware | Leave a comment

Programming tutorial: Part 13–UART

Atari Lynx programming tutorial series:

The previous parts we have looked mainly at graphics and memory. The Lynx has two other interesting features that set it apart from other handheld consoles. The sound and the ability to connect up to 16 Lynxes and allow them to communicate to one another. This part we will look into the details of ComLynx. ComLynx was the official name for the connection capability of one or more Lynx consoles. It will take more than one part to cover everything, so we will get started with the basics of serial communication and the hardware inside the Lynx. This helps understand how everything works and gives valuable insights before we move up an abstraction level by using the CC65 serial driver.

Primer in UART and serial communication

Before we dive into ComLynx we need a good understanding of UART and the serial form of communication that comes with it. UART is short for Universal Asynchronous Receiver/Transmitter and is a piece of hardware (an integrated circuit) that can do serial communication over a small number of lines. Usually the communication physical interface between two such hardware components consists of a cable that has a couple of wires, each with a dedicated purpose. Typically these are a ground (GND) and a receive (RX) and transmit (TX) line at a minimum. Other lines (CTS, RTS, DSR, DTR) can help create a more robust communication allowing for handshakes and transmission control. The Lynx does not have these, so let’s steer clear of those.

The RX and TX line can have a high (e.g. +5V, although the exact voltage may vary) voltage and low voltage (0V and again low might be a different voltage, potentially negative). Using the two voltage levels the lines can send bits across the line by alternating high and low to indicate 1 and 0 respectively. It is similar to morse code where the short and long beeps are also two distinct signals that allow you to build characters. Where it differs is that in this serial communication it is customary to have a particular transmission protocol.

The most common protocol used for communication defines a way to send/receive data and check that the data has arrived completely and without error. The terminology includes Mark (for the high signal, or 1) and Space (for the low signal, or 0). The idea is that each piece of data is surrounded by a start bit and stop bits. The start bit signals the start of the data to follow and serves as a sync point for the receiver to begin reading the data that follows using its internal clock. When the data has been sent, one or two stop bits follow to signal the end. The stop bits might be preceded with a parity bit that helps determine errors.

synchronous_transmission2(source)

The parity bit is something special. It can be used by the hardware to detect errors in some cases. The parity bit helps to determine and check the parity of the data. It works like this. The bits in the data that have a high value (1) are counted. The parity bit is chosen to be 0 or 1 depending on the kind of parity check. With ‘even’ parity the parity bit is determined to result in an even number of 1 bits. Similarly, ‘odd’ parity is used when the number must be odd, not even. For example, say the data transmitted was 10011001. That’s four ones, so that is an even number. With even parity the parity bit should be zero, otherwise the total number of ones including the parity bit will be odd. For odd parity it would be necessary to chose the parity bit as one resulting in 5 ones which is odd. Otherwise the total number would remain an even number. Whenever the parity bit is incorrect for the chosen convention (odd or even) it means an error condition has been detected. Since the check is based on a single bit with two values of which one is correct, it means that you will only find errors in 50% of the cases. Not perfect, but still better than nothing.
Three other options for the parity bit exist: None, Mark and Space. None means that no parity bit is included. The stop bit(s) will immediately follow the data bits in that case. Mark parity always has a high parity bit, and Space has a low parity bit.

Each bit that is transmitted is defined by the high- or low-ness of the line for a period of time. That time, the bit time, is determined by the clock speed of the UART. You have probably come across the unit of Baud, or “Bits per second”. It was pretty common to refer to your modem speed in Baud. E.g. my first modem was a Dynalink 14k4 baud modem, capable of transmitting 14400 symbols (or tones) per second. Although Baud and bits per second are not necessarily the same, it does hold true for the UART in the Lynx. For digital devices the symbols are bits, hence the reason that Baud and bits per second are equivalent.

UART in the Lynx

The UART of the Lynx is a circuit that lives inside of Mikey. The UART supports various baud rates, and has several settings for the parity bit. The baud rate is governed by the countdown value and frequency settings you provide to timer 4. The timer governs the pace in which the bit are transferred. All Lynx consoles should have the same baud rate to understand each others bit rate.

The data that is transmitted over the wire has the following 11-bits format:

image

The start bit and stop bit are always present and have a values of 0 and 1 respectively. This is a common choice as it will make sure that there is always a transition between the stop and start bit (1 to o). The data is sent with the least significant bit first (LSB).

The parity bit is set depending on a chosen setting. The Lynx supports the variations of Odd, Even, Mark and Space. The omission of None means that the 9th bit is always sent. Not all of the parity settings  are actually useable in a real-life scenario. For odd and even parity the bit is set appropriate. However, the hardware has a flaw that results in the parity calculation to include the parity bit itself. The Lynx can communicate with another Lynx just fine, because both have this bug. On the other hand, the Lynx will have a hard time communicating with non-Lynx devices (such as a PC with a serial port) that will check the parity bit the normal way.

The Lynx’s UART has a TX, RX and GND line for send, receive and ground. The peculiar thing about the Lynx wiring of the cable is that the RX and TX are connected together inside the ComLynx cable.

image

The design choice to connect RX and TX together has a lot of consequences:

  • The hardware now has the simplest setup.
  • Whatever a Lynx transmits is also received by himself.
  • The Lynx can detect when there is something connected to the ComLynx port, because it will be “short-circuited”.
  • No hardware or software handshakes are possible.
  • When one Lynx talks, all others must listen to avoid transmission errors.

Inside the UART there is transmitter and receiver hardware. For transmitting the UART provides a holding register and a shift register. The holding register can hold the next byte that must be sent. The shift register pushes the actual 8 data bits over the wire, wrapping it with the start, parity and stop bits.

To send something you first put a byte in the holding register. The hardware transfers it to the shift register when it is empty and ready to accept the next byte. Then the shift register starts sending it out. In the meantime you can put the next byte in the holding register, because that has become empty after the byte was transferred to the shift register.

Interrupt me

The UART send and receive mechanism uses timer 4 as its clock to generate the bit rate and it can fire interrupts (IRQ) when actually sending or transmitting. Indirectly the interrupts are a way of notifying you of a newly received byte or an empty holding register. In the interrupt handler you can check flags (TXRDY and RXRDY as we’ll see in a moment) to see whether a byte was sent or received. If so, you should put a new byte in the holding to keep the outbound dataflow going and you should read received bytes quick enough before the next one arrives. Should you be slow to put in a new byte to transmit, you’re wasting time sending all data. If you are too late to pick up a byte from the receive holding register, you will get an overflow because the receiver cannot place the new byte and lose some data.

It’s your choice whether you want to use interrupts for sending and receiving or not. Using interrupts you can continue doing other stuff, so this is a reasonable option when you are performing other tasks (gameplay is an example) and want to send and receive data in the meantime during interrupt handling. On the other hand, should you have a dedicated part of your program/game that does send or receive only, you can use the send and receive flags to check whether you should use SERDAT.

A programmer’s look at UART

Now that we know what the hardware looks like and how it behaves, it is time to look at the programming side of things again. The Lynx has various hardware registers inside Mikey that have a memory address so you can change settings for the UART hardware and help send and receive data.

First of all, the UART uses timer 4 to serve as the baud rate generator. So, by setting the time of each timer tick and the countdown period you can indirectly specify the baud rate. The baud rate is calculated by determining the time it takes to send 8 bits when timer 4 has a particular speed. The baud rate calculation is as follows:

clip_image002[12]

where rtimer4 and ftimer4 are the reload (countdown number) value and frequency of timer 4. The countdown value is set to a minimum of 1, but the timer will trigger when it underflows (from 0 to a virtual -1). It will need a minimum of two timer periods to trigger. The frequency inverted gives the number of clock ticks per seconds. The end result is the baud rate as it is frequently used (bytes per second) for devices, but not the official definition of bits per second.

Some examples:

Reload

Clock speed

Frequency

Baud rate

1

1 µs

1 MHz 1000000/((1+1)·8) = 62500 Bps
2

1 µs

1 MHz 500000/((1+1)·8) = 31250 Bps
12

1 µs

1 MHz 1000000/((12+1)·8) = 9615 ≈ 9600 Bps
207

2 µs

500 kHz 500000/((207+1)·8) = 300,5 Bps
255

64 µs

15625 Hz

15625/((255+1)·8) = 7,63

The most important two memory locations are SERCTL (0xFD8C) and SERDAT (0xFD8D). The first, SERCTL,  refers to the serial control register and allows you to change settings. SERDAT is where you will read or write the serial data.

SERCTL turns out to be a weird register. The behavior is totally different when writing to or reading from it. In other words, when you write a specific value, and then read it, you will probably get a different value from the one you wrote.


Bit

Write


Read

7 TXINTEN
Transmitter interrupt enable
The interrupt bit for timer 4 will correspond to the transmitter ready bit (i.e. you can put a new character in SERDAT)
TXRDY
Transmitter buffer empty
The buffer to hold data is ready to accept another byte. You can write it to SERDAT.
6 RXINTEN
Receiver interrupt enable
With this enabled the interrupt bit for timer 4 will correspond to the receiver ready bit (i.e. a character was received and can be read from SERDAT).
RXRDY
Receive character ready
A character was received and can be read from SERDAT.
5 0 (zero)
Future compatibility
No idea what they meant to keep compatible in the future.
TXEMPTY
Transmitter totally done
The transmitter has both an empty buffer and shift register. All offered data has been sent completely.
4 PAREN
Parity bit enabled
The parity checking is enabled and the parity bit will be calculated according to the odd or even setting (see PAREVEN).
PARERR
Received parity error
The data that was received had a parity error, so the parity bit did not match according to the parity setting.
3 RESETERR
Resets all errors
Writing a value with this bit set will clear all three errors (parity, framing and overrun) should they be set.
OVERRUN
Received overrun error
The data in the receive holding register was not read quickly enough. The new data could not be delivered.
2 TXOPEN
1: Open Collector
0: TTL driver
Choose between these two modes of transmission. A bug in the hardware causes the state of the output to be high after power up. The advice is to set the bit to Open Collector (1) to fix the problem.
FRAMERR
Received framing error
There was an error in the frame. That probably means that from the suspected start bit the stop bit was not received at the expected moment. It usually means that more than one Lynx is sending at the same time.
1 TXBRK
Sends a break
For as long as this bit remains 1.
It should be set at least for a 24 bit period according to the current baud rate. The specification mentions that a break is a start bit followed by 8 zero data bits, a parity bit and the absence of a stop bit at the expected time.
RXBRK
Break received
A break was received because for at least 24 bits of transmit time, there was a high signal.
0 PAREVEN
1: Even parity (or Mark)
0: Odd parity (or Space)
This parity is used for both sending and receiving. When PAREN is 1 these parity values are used. If PAREN is 0 then the value in parentheses is used (Mark or Space) as the value of the 9th bit.
PARBIT
9th bit
This bit reflects the parity bit of a received frame. It is set to the parity calculation when PAREN is 1. Or it is whatever PAREVEN is at sender when PAREN is 0.

A deeper look at the way the data is being transmitted will explain how the byte travels through the UART transmitter. It is important to realize that the TXRDY bit refers to the holding register, while the TXEMPTY bit represents both empty state of the holding and the shift register.

Here’s how the various states of the TXRDY and TXEMPTY bits change throughout the lifecycle of sending. The scenario shows how two bytes are sent in sequence from the start when no transmission has been done yet.

         image At first the holding register and sending shift register are both empty. TX is ready and empty.
  image A byte is loaded into the holding buffer by a write to SERDAT. TX is no longer ready nor empty.
         image The byte is transferred from the holding to the shift register. TX is ready for new input in the holding register.
image The shift register is pushing out the bits of the data. In the meantime new data is loaded into the holding register.

The reverse is true for receiving data, except that the data is eventually put into the receive holding register, which can be read from SERDAT.

The UART will always check the parity bit on received data whether PAREN is set to enabled or not. It does look at the PAREVEN bit, so for Even and Odd a calculation is done, but for Mark and Space it only compares the value of the bit to the setting. Any parity errors are always reported through the PARERR bit in a read from SERCTL. You can inspect the value of the parity bit through PARBIT in the SERCTL register by reading from it.

Groundwork done

OK, no coding just yet, but a lot of background on the UART in the Atari Lynx and how it works. This will help us get started with the ComLynx features. Next time we will look at the way CC65 provides serial communication and also how to dig a little deeper and get started with the interrupts and low level timers and registers.

Till next time.

Posted in Tutorial | 3 Comments