What You Should Know About Standards, APIs, Interfaces and Bindings
Many (but not all) computer standards are specified as Application Program Interfaces (APIs). APIs are most commonly expressed as a set of operations, associated data definitions, and the semantics of the operations on some underlying system.
APIs are only useful in the context of an executable computer program, and computer programs must be written in some programming language. So the problem here is how to provide access to the API from multiple programming languages. To explain the problem, we define two related terms:
- Interface:Within the context of an API standard, the abstract expression of the requirements and behaviour of the interface.
- Binding:The realization of an API standard interface in terms of a given programming language.
For example, the POSIX operating system interface (ISO 9945-1:1990) specifies that a file can be opened for input, for output, or for both input and output. It also specifies that there is a data value associated with each open file. The C binding for POSIX specifies that there is an operation named "open", with the following (ANSI C) signature:
- /* contains definitions for the following
- * #define O_RDONLY read-only access
- * #define O_RDRW read-write access
- * #define O_WRONLY write-only access
- * #define EACCES access denied
- * #define EROFS cannot write to read-only file system
- * extern int errno;
- * ...
int open (char* path, int oflag, int mode);
/* return -1 for failure and set errno */
This provides a C programmer with access to the POSIX interface for opening files. But what happens if you are programming in a language other than C? You need a language binding for your programming language to the POSIX interface, that maps call in your programming language to the underlying POSIX interface.
It turns out that there are actually two related problems in describing language bindings to standard API interfaces. One problem deals with the mapping of specific language features/capabilities to the interface, and the other problem relates to how the language binding is documented as a standard.
Most APIs are described in terms of a specific language binding. This comes from the historical fact that most API standards are derived from existing practice that evolved in a specific programming language. The POSIX standards, for instance, derive from the use of the C language in the original Unix system. In this case, a single document specifies both the interface and an associated language binding. Some APIs are written to be 'programming language independent', and use a metalanguage or formal description language to specify the interface. See the work of ISO JTC1/SC22/WG11, or ISO 9075-3, Call Level Interface for Database Language SQL.
Consider the POSIX 'open file' code mentioned earlier, and assume that you want to call this operation from a strongly-typed language such as Ada, Pascal or Modula-2. A "direct mapping" would lexically substitute your Ada/Pascal/Modula-2 identifiers for the equivalent C identifiers. Thus a "direct Ada binding" would preserve the C binding's structure and names, as much as possible:
- package POSIX is
- Errno: Integer;
- function Open
- (Path: in String; Oflags: in Integer; Mode: in return Integer;
- returns -1 for failure and set the global variable
- errno with information about the reason for failure
This direct binding makes no use of the strong type definitions available in Ada. Nor does this direct binding use Ada's exception mechanism.
A direct binding is generally simpler to produce, since it can often be done with machine translation from one programming language to another. But sometimes a concept in one language does not translate to a similar concept in another language. (For example, the semantics of the C pre-processor do not map to language features in most other programming languages.) The resulting binding does not take full advantage of the target programming language, particularly when going from a less-expressive language (e.g. a language without strong typing) to a more semantically-rich language (e.g. a language with strong typing, or with an exception model.) An abstract binding can make full use of the entire semantics of the target language, but requires substantial human analysis to produce the 'abstractions' from the API standard and then to represent these abstractions in the target language.
The other problem in language binding is a problem in standardization. How should the new language binding be documented? One approach, analogous to the "direct mapping" approach for describing the binding, is to produce a "thin document". A thin Ada language binding standard for the operation Create would cite the previously standardized C binding directly:
- The Ada operation POSIX. Open behaves the same as the C operation
- open(). The meaning of the parameters Path, Oflag and Mode match
- their C counterparts, and the integer return value has the same
- meaning as the int value returned by open ().
The alternative is to produce each language binding as a self-contained document, a "thick" document. In this case, the full semantics of the file open operation would be described by the appropriate language binding document. Both the C and Ada documents would describe the behaviour of Open, the meaning of the parameters, the errors that are detected, etc.
A "thin document" has the advantage of being easy to generate. It also provides a single point of reference for the semantics of the Interface, by citing one document (the POSIX C binding, in our example) as the single normative reference for the file open operation. The user of the Ada binding would have to refer to both the Ada binding (thin) document and also the C binding document. "Thick documents" duplicate the underlying interface semantics in each language binding. The Ada programmer would have to read only the Ada binding to find out the semantics of the file open operation. Because each binding has a copy of the semantics, there is a potential for conflicts between multiple specifications for the same interface.
This has presented two related problems in defining language bindings to standard interfaces, particularly those specified in terms of a specific programming language. "Direct vs Abstract" refers to how the features of the underlying interface are mapped to programming language features. "Thick vs Thin" refers to the documentation for a set of language bindings to the same API.