Chapter 11 SynDEx downloader specification

11.1 Context

SynDEx allows the efficient programming of parallel, distributed, heterogeneous architectures, composed of several different types of processors, and of several different types of communication medium. From a user specification of an algorithm dataflow graph and of an architecture resources graph, and from algorithm and architecture characterized libraries, SynDEx automatically generates an application specific executive code for each processor, and provides a makefile to automate the compilation and linking of each executive, and its downloading into the program memory of the corresponding processor.
Separate programming of non-volatile program memories being unpractical, SynDEx considers that each processor has, for only non-volatile resident program, a boot-loader (which may be very small and simple, or may rely on a big and complex operating system) expecting an executive to be downloaded from a neighbour processor through a communication medium, except for a single host processor, designated by the name root in the specified architecture graph, which boot-loader expects all executives to be stored altogether in its local non-volatile memory.
Consequently, SynDEx computes, over the architecture graph, an oriented coverage tree rooted on the root processor, and generates in each processor executive the code needed to download the compiled executives through this tree, in a predetermined order which is also used to generate the makefile.

11.2 Boot and download process

This process is the same for all processors, except that the root processor gets executives from its local non-volatile memory, whereas all the other processors get executives from their neighbour processor which is their ascendant towards the root of the download tree. The processors which have the same ascendant processor are called the descendants of that processor.
When powered on, each processor boots by executing its resident boot-loader which gets the processor’s executive, loads it into the processor’s program memory, and executes it. During its initialization phase, the executive gets and forwards executives to all its descendants, before proceeding with application data processing.
The root processor, usually an embedded PC or other kind of workstation, bootloads from its disk an operating system which automatically loads and executes a startup program allowing the user to choose between different applications. During early developments, this program may be a simple shell (but this requires a keyboard to be available), and the user enters a make command to compile the executives if needed, and to execute the root executive, with the other executive files passed as arguments on the command line. In applications where it is unpractical to use a keyboard permanently connected, the startup program may use another input device (for example a switch or a touch screen) to let the user choose between different predefined shell commands, starting different applications through the corresponding make command, or simply launching a shell for interaction with a keyboard. In more deeply embedded applications, where the root processor has neither a disk nor an operating system, all the executives are stored in a FLASH memory, and the root processor boots by executing directly its own executive, and finds the other executives sequentially stored in its FLASH.
The first executive forwarded to a descendant is received, stored, and executed by that descendant’s boot-loader. Then, while that descendant’s executive asks for executives, the ascendant executive gets and forwards the next executives to the same descendant, until that descendant’s executive signals that it has itself no more executives to forward. Then the ascendant may switch to its next descendant, until it has no more descendant to service, and hence no more executive to forward. This fully sequential download process boots processors in the order of a depth-first traversal of the download tree.
In the case of a point-to-point medium, the descendant executive may proceed to application data communications as soon as it has no more executive to forward, whereas in the case of a multipoint medium, the descendant executive must wait until the ascendant executive signals that it has no more executive to forward (to avoid communication interferences between descendant application data and ascendant download data).

11.3 Common download format

Each processor type may have a different compiler (linker) output format, and some processor types may have a ROM-ed embedded boot-loader (firmware), with its own requirements on the download format. The SynDEx common download format encapsulates the details and the differences of the compiler output formats, and of the boot-loaders download formats; it is composed as follows:

four bytes prefix encoding the 32 bits big-endian total length of the following sequence of bytes,
a sequence of bytes encoding one complete executive, structured as required by the destination boot-loader, and padded if needed with null bytes until the total length is a multiple of four.

The first executive forwarded to a descendant being received by that descendant’s boot-loader, that executive must be sent without its four bytes prefix; the following executives sent to the same descendant being forwarded by that descendant’s executive, they must be sent with their four bytes prefix.
The sequence of bytes itself must follow the format expected by the destination boot-loader. Therefore a linker post-processor must be developped for each processor type, to translate the linker output file into the SynDEx common dowload format described above. All the post-processors’ outputs will be concatenated by the makefile into a unique contiguous image (file), that the root executive will use as source.

11.4 Downloader macros

The downloader code is generated by two macros:

loadFrom_ starts the initialization phase of the communication sequence of the communication medium connected to the ascendant processor; its first argument is the name of the ascendant processor, its next arguments, if any, are the names of the other communication medium connected to descendant processors, if any;
loadDnto_ starts the initialization phase of the communication sequence of each communication medium connected to a descendant processor; its first argument is the name of the communication medium connected to the ascendant processor, its next argument(s) is (are) the name(s) of the descendant processor(s).

Processor names are usefull to address processors connected to multipoint medium: a processor name may be suffixed to give the name of a user defined macro, which substitution gives the processor address.
As executives data may be forwarded through several communication medium of different bandwidths, transfers must be synchronized such that data flow at the speed of the slowest communication medium.
Between processors, if flow control is not supported by the communication medium hardware, it must be implemented by ready to receive control messages sent by the loadFrom_ code for each chunk of data to be sent by the loadDnto_ code. Inside a processor, the loadFrom_ and loadDnto_ macro cooperation is based on the order in which the spawn_thread_ macros (one for each communication sequence, i.e. for each communication media) are generated in the initialization phase of the main_ ... endmain_ sequence: the spawn_thread_ macro corresponding to the thread_ macro of the communication sequence starting with the loadFrom_ macro (i.e. of the media connected to the ascendant processor) is called first, followed by the other spawn_thread_ macros, among which the ones, if any, corresponding to the communication sequences with a loadDnto_ macro (i.e. of the media connected to the descendant processors).
If the processor is a leaf node of the download tree, its loadFrom_ macro has only one argument; in this case, it directly generates the code sending to the ascendant processor a "null" message meaning that no more executive is requested, followed, in the case of a multipoint medium, by the code waiting for other executives to be downloaded to the other processors connected to the communication medium, until the ascendant processor sends an "empty" executive meaning that the download process is complete on this communication medium.
Otherwise, before generating the code described in the previous paragraph, the loadFrom_ macro generates a RETURN instruction (which will return control after the CALL instruction generated by the spawn_thread_ macro), followed by a loadFrom_end_: label, and the loadFrom_ macro also defines three macros for use by the loadDnto_ macros:

the loadFrom_req_ macro must generate the code that sends a non-null message requesting the ascendant processor to download another executive;
the loadFrom_get_ macro must generate the code that receives one word of executive data from the ascendant processor; word means the size of a processor register, usually 32 bits; if the communication medium transfers executive data by chunks of N words, then every N calls to the code generated by the loadFrom_get_ macro receives a full chunk of data and returns its first word, and the next N-1 calls each return a next word of the chunk;
the loadFrom_next_macro which is called at the end of each loadDnto_ macro, must generate a CALL loadFrom_end_, but only for the very last loadDnto_ macro.

If the code generated by any of these three macros is limited to a few instructions, it may be generated inline, otherwise the loadFrom_ macro generates this code as a subroutine (between the RETURN instruction and the loadFrom_end_ label), and a call to that subroutine is generated instead of the inline code.