SynDEx allows the efficient programming
of parallel, distributed, heterogeneous architectures,
composed of several different types of processors,
and of several different types of communication medium.
From a user specification
of an algorithm dataflow graph and of an architecture resources graph,
and from algorithm and architecture characterized libraries,
SynDEx automatically generates
an application specific executive code for each processor,
and provides a makefile
to automate the compilation and linking of each executive,
and its downloading
into the program memory of the corresponding processor.
Separate programming of non-volatile program memories being unpractical, SynDEx considers that each processor has, for only non-volatile resident program, a boot-loader (which may be very small and simple, or may rely on a big and complex operating system) expecting an executive to be downloaded from a neighbour processor through a communication medium, except for a single host processor, designated by the name root in the specified architecture graph, which boot-loader expects all executives to be stored altogether in its local non-volatile memory.
Consequently, SynDEx computes, over the architecture graph, an oriented coverage tree rooted on the root processor, and generates in each processor executive the code needed to download the compiled executives through this tree, in a predetermined order which is also used to generate the makefile.
This process is the same for all processors,
except that the root processor
gets executives from its local non-volatile memory,
whereas all the other processors
get executives from their neighbour processor
which is their ascendant towards the root of the download tree.
The processors which have the same ascendant processor
are called the descendants of that processor.
When powered on, each processor boots by executing its resident boot-loader which gets the processor’s executive, loads it into the processor’s program memory, and executes it. During its initialization phase, the executive gets and forwards executives to all its descendants, before proceeding with application data processing.
The root processor, usually an embedded PC or other kind of workstation, bootloads from its disk an operating system which automatically loads and executes a startup program allowing the user to choose between different applications. During early developments, this program may be a simple shell (but this requires a keyboard to be available), and the user enters a make command to compile the executives if needed, and to execute the root executive, with the other executive files passed as arguments on the command line. In applications where it is unpractical to use a keyboard permanently connected, the startup program may use another input device (for example a switch or a touch screen) to let the user choose between different predefined shell commands, starting different applications through the corresponding make command, or simply launching a shell for interaction with a keyboard. In more deeply embedded applications, where the root processor has neither a disk nor an operating system, all the executives are stored in a FLASH memory, and the root processor boots by executing directly its own executive, and finds the other executives sequentially stored in its FLASH.
The first executive forwarded to a descendant is received, stored, and executed by that descendant’s boot-loader. Then, while that descendant’s executive asks for executives, the ascendant executive gets and forwards the next executives to the same descendant, until that descendant’s executive signals that it has itself no more executives to forward. Then the ascendant may switch to its next descendant, until it has no more descendant to service, and hence no more executive to forward. This fully sequential download process boots processors in the order of a depth-first traversal of the download tree.
In the case of a point-to-point medium, the descendant executive may proceed to application data communications as soon as it has no more executive to forward, whereas in the case of a multipoint medium, the descendant executive must wait until the ascendant executive signals that it has no more executive to forward (to avoid communication interferences between descendant application data and ascendant download data).
Each processor type may have a different compiler (linker) output format, and some processor types may have a ROM-ed embedded boot-loader (firmware), with its own requirements on the download format. The SynDEx common download format encapsulates the details and the differences of the compiler output formats, and of the boot-loaders download formats; it is composed as follows:
The first executive
forwarded to a descendant being received by that descendant’s boot-loader,
that executive must be sent without its four bytes prefix;
the following executives
sent to the same descendant being forwarded by that descendant’s executive,
they must be sent with their four bytes prefix.
The sequence of bytes itself must follow the format expected by the destination boot-loader. Therefore a linker post-processor must be developped for each processor type, to translate the linker output file into the SynDEx common dowload format described above. All the post-processors’ outputs will be concatenated by the makefile into a unique contiguous image (file), that the root executive will use as source.
The downloader code is generated by two macros:
are usefull to address processors
connected to multipoint medium:
a processor name may be suffixed
to give the name of a user defined macro,
which substitution gives the processor address.
As executives data may be forwarded through several communication medium of different bandwidths, transfers must be synchronized such that data flow at the speed of the slowest communication medium.
Between processors, if flow control is not supported by the communication medium hardware, it must be implemented by ready to receive control messages sent by the loadFrom_ code for each chunk of data to be sent by the loadDnto_ code. Inside a processor, the loadFrom_ and loadDnto_ macro cooperation is based on the order in which the spawn_thread_ macros (one for each communication sequence, i.e. for each communication media) are generated in the initialization phase of the main_ ... endmain_ sequence: the spawn_thread_ macro corresponding to the thread_ macro of the communication sequence starting with the loadFrom_ macro (i.e. of the media connected to the ascendant processor) is called first, followed by the other spawn_thread_ macros, among which the ones, if any, corresponding to the communication sequences with a loadDnto_ macro (i.e. of the media connected to the descendant processors).
If the processor is a leaf node of the download tree, its loadFrom_ macro has only one argument; in this case, it directly generates the code sending to the ascendant processor a "null" message meaning that no more executive is requested, followed, in the case of a multipoint medium, by the code waiting for other executives to be downloaded to the other processors connected to the communication medium, until the ascendant processor sends an "empty" executive meaning that the download process is complete on this communication medium.
Otherwise, before generating the code described in the previous paragraph, the loadFrom_ macro generates a RETURN instruction (which will return control after the CALL instruction generated by the spawn_thread_ macro), followed by a loadFrom_end_: label, and the loadFrom_ macro also defines three macros for use by the loadDnto_ macros:
If the code generated by any of these three macros is limited to a few instructions, it may be generated inline, otherwise the loadFrom_ macro generates this code as a subroutine (between the RETURN instruction and the loadFrom_end_ label), and a call to that subroutine is generated instead of the inline code.