Software Anatomies, part 1: Anatomy of a CLI Program Written in C

by Matthew Wilson

This article first published in ACCU's CVu, September 2012. All content copyright Matthew Wilson 2012.

Abstract

This article, the first in a series looking at software anatomy, examines the structure of a command-line program as it grows from a small example main() into a releasable modular project, noting aspects of software quality with a focus on boilerplate and dependency. The principle practical aim of the series is to identify rules for (re-)building code-generation wizards to last me for the next decade.

In this article I'm going to begin by discussing the motivation for the articles (and related programming activities) and then work through the development an exemplar command-line program, with which I hope to identify a number of issues to be elucidated in subsequent articles.

Introduction

I want to stop thinking! More precisely, I want to stop thinking about basic things. More accurately, I want to stop thinking about fundamental things.

During our 2011-12 Christmas trip back to Blighty I had the singular pleasure of spending 90 conversationally-dense minutes in a London pub with Chris Oldwood and Steve Love talking about, as Steve coined it "fundamental, not basic" issues of programming. Steve related tales of former colleagues' frustrations with having to think about "basic things", to which he offers the above apposite correction. Thinking about fundamental things is not a waste of time. The fact is software development is still a very young field, as those of us who try hard at our practice know all too well - the bulk of the community do not yet even use basic terms such as "error" properly or definitively [QM-5]!

The more software I write, the more I am concerned with fundamental things. The trains of thought, and the concomitant changes to my practice, that prompted me to start (and soon pick up again) my Quality Matters column, mean that I can no longer develop software in quite the same ways as before. I must perforce consider quality, and most particularly failure, a lot more - diagnostics, contracts, testability, and so on - when I write even simple programs.

But there's a limit to how interesting such concerns can be, and how productive they can let one be, and I've reached it, at least in one area of programming: It's time for me to start drawing some lines in the sand when it comes to command-line interface (CLI) programming. I've been writing CLI programs in C for 25 years (and in other languages for considerable times too). I've been using program-generating wizards for almost twenty years. But these tools are well past use-by-date, not only in terms of the environments within which they run, but also regarding the state of the art of the language(s), libraries, and (good) practices that they employ.

Now I want to identify definitively the "anatomies" of CLI programs, solidify them in libraries and program generating wizards, and just crack on. I also seek, wherever possible, to identify ways in which the boilerplate aspects of programs can be abstracted without detracting from flexibility or transparency, such that their visual impact can be hidden/diminished, thereby increasing the transparency of program-specific code (and, in a real sense, increasing the average transparency of all the code that I write).

Ideally, I'd like to be able to begin this series of articles with an oracular stipulation of the definitive taxonomy of software anatomies and a presentation of matrices of program-area o module-type o language, then distil them in subsequent instalments for particular programming areas in a simple fait accompli. Certainly I do have some strong feelings in this area, e.g. where diagnostic facilities should be located in dependency graphs. Problem is, I don't (yet) know them all: that's what I'm hoping to identify as we go.

So, where to start? Because I've done a tremendous amount of CLI programming (in C, C++, C#, and Ruby) this last couple of years, I do have strong feelings about CLI program anatomies, and much (and varied) experience to back them up. So, the plan is to work bottom up, starting with CLI programs written in C. Naturally, I hope (and request!) to receive plenty of feedback to these articles from you gentle readers, since I cannot expect to have captured all good and bad practices, even in those areas in which I'm most experienced.

Anatomy

First I should explain the use of the term anatomy. Simply, I'm interested in both logical structure and physical structure, so I chose anatomy as an umbrella term, rather than having to constantly refer to "both logical and physical structure". Hopefully it'll catch on. :-)

For example, it's useful that the core elements of a CLI program be (more) testable, particularly in automated test harnesses. Both physical and logical dependencies impact on this. If the core program functionality is located within the same physical file as main(), that increases the difficulties in compiling and linking it into a test harness - we'll be forced to use some Feathers-like pre-processor manipulation [FEATHERS]. Conversely, if the core logic depends on specific third-party general services libraries - e.g. diagnostic logging, contract enforcement, database manipulation, etc. - these will significantly increase the difficulty and scope of the testing.

I hope to elucidate the impacts of both aspects of program anatomy as this series progresses.

Logical Layers of Components/Services

Another aspect in which I'm very interested is the layering of the logical dependencies. Considering just a CLI program, we can identify a number of components/services that may be found:

Operating System services;
Language Runtime services;
Language Standard Library components;
Diagnostic services (including diagnostic logging, runtime contract enforcement, code coverage, memory-leak detection);
Command Line Parsing component; and
other 3^rd-party library services/components.

and, of course:

all the programmer-written code: processing command-line arguments; decide what to do; do it.

It seems pretty uncontentious to claim that Operating System services must be ready and available to Language Runtime services, and that Language Runtime services must be ready and available to all other components/services. However, I think there will be some equivocation on what the dependency graph looks like beyond that point, and that will also depend on language and program type. All I will stipulate for now is Figure 1; I'll revisit this graph many times in the coming series.

Figure 1

Example Program: slsw

The only way I know to go about this is to use an example, starting simple and building up to what I consider to be a releasable standard, or until I run out of stream or space (or time!). After fluffing around with several different programs I've settled on rewriting an existing tool slsw (slash swap), which, er, swaps slashes in its input to its output.

For this first article there are several simplifications:

no diagnostic logging;
most-basic contract enforcement, using assert(); and;
assumption that it is a standalone program. In reality it is one of a suite of related, and similarly implemented tools, the significance of which, to program generation and coding practice, will be discussed at a later time.

Step 1 - Initial Version

The first step is shown in Listing 1. I trust it's largely self-explanatory; the use of stdin and stdout via the in and out variables is a minor sop to revisibility (see sidebar) in light of what's to come.

Listing 1

    #include <stdio.h>
    #include <stdlib.h>

    int main(int argc, char** argv)
    {
      FILE* in  = stdin;
      FILE* out = stdout;
      int   ch;

      for(; EOF != (ch = fgetc(in)); )
      {
        if('\\' == ch)
        {
          ch = '/';
        }
        fputc(ch, out);
      }

      return EXIT_SUCCESS;
    }

The implementation is all very well if:

you only want to read from standard input and write to standard output;
you only want to swap backslashes to slashes; and
you don't have to ask the program how to use it.

sidebar : Revisiology & Revisibility

Over the last decade of writing about software, I've become increasingly engaged with the notion of studying the history of a given source code entity throughout its life. As a consequence of many such studies, it's become clear that there are many practices - whether accidental or deliberate - that can significantly affect the ease with which such historical studies may be conducted.

As you may know, gentle readers, I'm always seeking out precise definitions for software development concepts (mainly to aid in clearing the ever-gathering fog of future shock in my own mind), and I occasionally suggest new names for such. I like to think I do the former reasonably well, but concede readily that I often stumble in the latter. And so you have been warned.

With a little help - though the blame remains all my own - from the ACCU General list members, I've devised two names for the two concepts described above:

Revisiology is the study of source control entity histories; and
Revisibility is the degree of ease by which an understanding of the (the nature and purpose of) differences in revisions of a source control entity can be gleaned. Revisibility is of particular interest (to me, at least) where it may be strongly affected by decisions made in aspects of coding in which choices exist. For example, the new code added from Step 2 -> Step 3 is highly revisible. (That it's not good code, in so far as it builds upon poor coding present in Step 2, is incidental to its revisibility, though not to its fitness otherwise!)

How far I will take these terms, and for how long, is yet to be determined, but I'll certainly be using them throughout this series of articles. (I reserve the right to rename them, though, if someone suggests better alternatives.)

Step 2 - read from file/stdin, write to file/stdout

Let's deal with the first of these issues, limitation to using standard input and standard output. While UNIX-like filter programs [PG2L] are most often used in this manner, it is also useful, and therefore common practice, to allow filenames to be specified for the input and/or output, as in:

$ slsw input.txt output.txt

Furthermore, in order to cope with the situation of wanting to write to a named file while still reading from standard input, it's also common to interpret a filename of "-" as meaning read from (or write to) standard input (or standard output); this was discussed in more detail in my article about CLASP [CLASP-2011].

Without resorting to use of any other libraries, we can support almost all of this properly by changing the two lines declaring and assigning to our FILE* variables, as shown in Listing 2. (NOTE: for reasons of brevity in this case I do not test and close FILE* variables, since streams are closed by the runtime when the program exits; it is best practice to do so explicitly in general.)

Listing 2

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    int main(int argc, char** argv)
    {
      FILE* in  = (argc < 2 || 0 == strcmp("-", argv[1])) ? stdin : fopen(argv[1], "r");
      FILE* out = (argc < 3 || 0 == strcmp("-", argv[2])) ? stdout : fopen(argv[2], "w");
      int   ch;

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }

The reason it's only almost proper is because the command-line flag "--", by convention, is used to specify that all subsequent arguments be treated as a value, regardless of whether they begin (or consist solely of) a hyphen. In the case of slsw, this would allow for specification of a file named "-". I'm ignoring this, because the issue was dealt with in the CLASP article, and, as you've probably guessed, I'm going to have to plug CLASP in pretty soon.

The code in Listing 2 works well in the normative case. But it's opaque, and not something anyone would write with any justified pride. Much worse, it (mis-)handles non-normative behaviour by undefined behaviour: passing the name of an unreadable input file causes a segmentation fault (on Mac OS-X, and likely on other systems also). Yikes!

Clearly, we have to check for failure to open named files.

Step 3 - Failure handling

For reasons of pedagogy and revisibility alone, I fix the non-normative issue in the manner shown in Listing 3; if this were to be the (near-)final implementation of the program, I would instead process detecting and processing the path arguments together, and much more clearly. Thankfully, I don't have to, because that's already too much silliness and wasted effort in command-line argument processing. It's time to call in CLASP.

Listing 3

    #include <errno.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    int main(int argc, char** argv)
    {
      char const* inName  = NULL;
      char const* outName = NULL;
      FILE* in  = (argc < 2 || 0 == strcmp("-", inName = argv[1])) ? stdin : fopen(inName, "r");
      FILE* out = (argc < 3 || 0 == strcmp("-", outName = argv[2])) ? stdout : fopen(outName, "w");
      int   ch;

      if(NULL == in)
      {
        int const e = errno;
        fprintf(stderr, "slsw: could not open '%s' for read access: %s\n", inName, strerror(e));
        return EXIT_FAILURE;
      }
      if(NULL == out)
      {
        int const e = errno;
        fprintf(stderr, "slsw: could not open '%s' for write access: %s\n", outName, strerror(e));
        return EXIT_FAILURE;
      }

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }

Note that, according to UNIX convention, the non-normative output - that which occurs when the program is not achieving its primary purpose: in this case the contingent reports in the unrecoverable condition handlers just added - is marked with the program name, and the normative output is not (as it would stop the program being of any use as a filter).

Step 4 - using CLASP (longhand)

Plugging CLASP straight into main(), giving Step 4 (see Listing 4), results in a file of nearly triple the size. (For sure, it's a lot more transparent than the mess of Step 3, but still ...). As touched upon in [CLASP-2011], it's almost never the right thing to plug it straight into main() along with your program logic. This issue of where different parts of the program should go is one of the main themes of this article, which I'll go into in a lot more detail later. For now, just go along with the incremental steps, if you don't mind.

Listing 4

    #include <systemtools/clasp/clasp.h>

    #include <errno.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    static clasp_alias_t aliases[] = 
    {

      CLASP_ALIAS_ARRAY_TERMINATOR
    };


    int main(int argc, char** argv)
    {
      clasp_arguments_t const* args;
      int const cr = clasp_parseArguments(
        CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE
      , argc
      , argv
      , aliases
      , NULL
      , &args
      );
      if(0 != cr)
      {
        fprintf(stderr, "slsw: failed to parse command-line: %s\n", strerror(cr));
        return EXIT_FAILURE;
      }
      else
      {
        char const* inName  = NULL;
        char const* outName = NULL;
        FILE* in  = stdin;
        FILE* out = stdout;
        int   ch;
        clasp_argument_t const* arg;

        if(clasp_checkValue(args, 0, &inName, NULL, &arg))
        {
          if(0 == arg->givenName.len)
          {
            if(NULL == (in = fopen(inName, "r")))
            {
              int const e = errno;
              fprintf(stderr, "slsw: could not open '%s' for read access: %s\n", inName, strerror(e));
              clasp_releaseArguments(args);
              return EXIT_FAILURE;
            }
          }
        }
        if(clasp_checkValue(args, 1, &outName, NULL, &arg))
        {
          if(0 == arg->givenName.len)
          {
            if(NULL == (out = fopen(outName, "w")))
            {
              int const e = errno;
              fprintf(stderr, "slsw: could not open '%s' for write access: %s\n", outName, strerror(e));
              clasp_releaseArguments(args);
              return EXIT_FAILURE;
            }
          }
        }

        for(; EOF != (ch = fgetc(in)); )
        . . .

        clasp_releaseArguments(args);
        return EXIT_SUCCESS;
      }
    }

Hopefully it's all pretty self-evident in light of the CLASP article [CLASP-2011], with the probable exception of the check on givenName's length: this uses a feature of CLASP whereby "-" arguments that are not preceeded by "--" and thus would usually be interpreted as flags are instead interpreted as values if the CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE parsing flag is specified, and identified as such by having non-empty givenName (and resolvedName, for that matter), which no other values ever have; only when the value is present and is not "-" do we open a file, rather than accept the built-in stream. (Note to self: perhaps some more transparency-engendering macro/function - e.g. clasp_valueIsSingleHyphen() - might be a good addition to the next release.)

Step 5 - using CLASP.Main

Although, in my opinion at least, the code in Step 4 is much improved in transparency - as well as actually properly handling all the required command-line permutations, don't forget - it is actually a good deal more verbose. Furthermore, although not shown in the listing, in the actual code I made two mistakes. The first was in comparing cr less than 0, rather than not equal 0: easy to do, hard to spot, even harder to test against (since clasp_parseArgument() failures are exceedingly rare [CLASP-2011]). The second was in omitting the first two calls to clasp_releaseArguments(): again, easy to do, and hard to spot.

Thankfully, CLASP.Main, a CLASP extension library, provides a way to avoid (mis-)writing this boilerplate from program to program, via initialisation-function layering (via the ExecuteAroundMethod pattern [EAM]), obviating both my real mistakes. Applying it gives Step 5, as shown in the differential Listing 5.

Listing 5

    #include <systemtools/clasp/clasp.h>
    #include <systemtools/clasp/main.h>

    #include <errno.h>
    . . .

    static clasp_alias_t aliases[] = 
    . . .

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      char const* inName  = NULL;
      . . .

      if(clasp_checkValue(args, 0, &inName, NULL, &arg))
      . . .

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }

    int main(int argc, char** argv)
    {
      int const clflags = 0
                        | CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE
                        ;

      return clasp_main_invoke(argc, argv, clasp_main, "slsw", aliases, clflags, NULL);
    }

Although the code is a lot clearer, we've still not much reduced the number of source lines. Now is a good time for me to foreshadow one of the themes of this study: considering how much of program source is dedicated to (uninteresting) boilerplate.

The still-small actual "doing" logic - the for-loop - is drowning in a much bigger function steeped in "deciding" logic - the command-line handling - and support/boilerplate logic. The delineations between the deciding and the doing, and the interesting and the uninteresting, are points of interest in program anatomy.

(NOTE: once again, revisibility influences the declarations of clflag, as it allows me to add/remove flags in a manner - one-per-line - that is isolated and unambiguous. This tactic I employ in real work.)

Step 6 - Implementing "--help" Flag

It's time to start handling some flags, starting with the conventional "--help" flag: display usage information and quit. Listing 6 shows the differential changes for Step 6.

Listing 6

    . . .

    static clasp_alias_t aliases[] = 
    {
      CLASP_FLAG(NULL, "--help", "invokes this help and terminates"),

      CLASP_ALIAS_ARRAY_TERMINATOR
    };

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .
      clasp_argument_t const* arg;

      if(clasp_flagIsSpecified(args, "--help"))
      {
        clasp_showUsage(
          NULL
        , aliases
        , "slsw"
        , "Synesis Software SystemTools (http://synesis.com.au/systools)"
        , NULL
        , "Swaps slashes in text"
        , "slsw [ ... options ... ] [<input-file>|-] [<output-file>|-]"
        , 0
        , 1
        , 6
        , clasp_showHeaderByFILE
        , clasp_showBodyByFILE
        , stdout
        , 0
        , 76
        , -4
        , 1
        );
        return EXIT_SUCCESS;
      }

      if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0))
      {
        fprintf(
          stderr
        , "slsw: unrecognised argument: %s\n"
        , arg->givenName.ptr
        );

        return EXIT_FAILURE;
      }

      if(clasp_checkValue(args, 0, &inName, NULL, &arg))
      . . .

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }
    . . .

Disregarding the somewhat alarming use of magic numbers, and assuming your willingness to read the CLASP docs for clasp_showUsage(), this should be reasonably easy to understand. But none of it's interesting: nothing more than more boilerplate.

I've also corrected an earlier oversight: So far, passing an unrecognised flag to the program will be treated as a file name (Steps 1-3) or silently ignored (Steps 4-5), neither of which is appropriate. We need to employ clasp_reportUnusedFlagsAndOptions() (see [CLASP-2011]), as shown, after all known flags/options are processed explicitly.

Packaging Concerns

As of Step 6, only 8 of the 100+ source code lines are actually to do with the business of swapping slashes! Even though that's probably a smaller ratio than would be the case in most programs of greater complexity of purpose, it's still not unrepresentatively low. From my experience in writing CLI programs, it's invariably the case that the wood is obscured by the trees. And this brings me back to two of my areas of interest: coupling and code generation.

As I mentioned in the introduction, I want to be able to start to think less about fundamental issues upon which there is general agreement, and to update my long-in-the-tooth code generation wizards accordingly, and, as a consequence of both, write better software more rapidly.

I've been threatening Steve with writing a series of articles on program anatomy for an embarrassingly long time now, and despite my procrastinations (righteous and otherwise) I have been thinking about the subject a lot. Consequently, I've come to the position that all CLI program logic that is "written" by the author - i.e. is not part of standard, system, or third-party libraries; this includes code that might be wizard-generated at the author's behest - can be considered to comprise the following behavioural/anatomical groups:

Decision logic : the code that works out what needs to be done and which component(s) will do it;
Action logic : the code that does the work deemed necessary by the decision-logic; and
Support logic : all the other stuff, including command-line parsing, diagnostic logging, and so forth.

Of course, now I've said it, it looks blindingly obvious, and not the least original. Furthermore, it's very likely to apply, albeit with differences, to other types of link-units; I've just not given them as much thought yet, so don't want to jump the gun.

But the point I want to proselytise in this article (and the others that'll look at different types of link-units and different languages) is that it's not just a thinking taxonomy: it's a doing one.

Let's consider again our little slsw program. As of Step 6 we can divide the code into the three groups as follows:

Decision:
- Detection of the "--help" flag, and invocation of third-party (CLASP) library functions to respond; or
- Detection of 0-2 command-line values, and invocation of third-party (CLASP) and standard library functions to open named files (and deal with failure to do so);
- Invocation of the fgetc() / fputc() for-loop; and
- Issuing of return value EXIT_SUCCESS.
Action:
- The fgetc() / fputc() for-loop; and
- The invocation of clasp_showUsage().
Support:
- All six #includes;
- Definition of aliases array; and
- All of main().

One could argue that detecting and handling the "--help" flag could, by virtue of its conventional nature, be classed as support logic, but I don't think that's helpful. Rather, it's decision and action logic that can be wizard generated.

Step 7 - Abstracting out "--help" Action Logic

The obvious next step is to implement "--version". But if we follow what was done for "--help" that's going to pad out our "main" (clasp_main()) even more. Innately (or experientially, at least) I have qualms about the anatomy of the program as it stands: all logic is clumped together. So, let's first start making things a bit more transparent by abstracting out the "--help" implementation - the action logic - into a worker function show_help() (see Listing 7).

Listing 7

    . . .

    static
    void show_help(FILE* stm);

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .
      if(clasp_flagIsSpecified(args, "--help"))
      {
        show_help(stdout);
        return EXIT_SUCCESS;
      }

      if(clasp_checkValue(args, 0, &inName, NULL, &arg))
      . . .

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }

    . . .

    static
    void show_help(FILE* stm)
    {
      clasp_showUsage(
        NULL
      , aliases
      . . . 
      );
    }

Step 8 - Implementing "--version" Flag

Given the foregoing two steps, the implementation of the "--version" flag is simple and obvious, as shown in Listing 8. Take note that the major, minor, and revision version numbers - 0, 1, and X (currently 8) - are specified in two places! This is a clear violation of DRY SPOT [PragProg, IC++, AoUP], and it won't surprise you in the least to learn that I actually fluffed it and got them out of step during the development. We'll deal with this soon, after we deal with the problem that our slash swapping action logic is drowning in a sea of main()s.

Listing 8

    . . .

    static clasp_alias_t aliases[] = 
    {
      CLASP_FLAG(NULL, "--help", "invokes this help and terminates"),
      CLASP_FLAG(NULL, "--version", "displays version and terminates"),

      CLASP_ALIAS_ARRAY_TERMINATOR
    };

    static
    void show_help(FILE* stm);
    static
    void show_version(FILE* stm);

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .

      if(clasp_flagIsSpecified(args, "--help"))
      {
        show_help(stdout);
        return EXIT_SUCCESS;
      }
      if(clasp_flagIsSpecified(args, "--version"))
      {
        show_version(stdout);
        return EXIT_SUCCESS;
      }

      if(clasp_checkValue(args, 0, &inName, NULL, &arg))
      . . .

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }

    . . .

    static
    void show_help(FILE* stm)
    . . .

    static
    void show_version(FILE* stm)
    {
      clasp_showVersion(
        NULL
      , "slsw"
      , 0
      , 1
      , 8
      , clasp_showVersionByFILE
      , stm
      , 0
      );
    }

Step 9 - Abstracting out Slash-swapping Action Logic

You can really see the influence of revisibility in this one: I've abstracted out the slash-swapping action logic into the slsw() function by providing a forward function declaration and then extract-as-function refactored right where it sits, as shown in Listing 9. Now clasp_main() is almost entirely, "cleanly", composed of decision logic: I think that's a major improvement, and sits well with our understanding of its purpose in deciding what to do, and not worrying about how that's done.

Listing 9

    . . .

    static clasp_alias_t aliases[] = 
    . . .

    static
    int slsw(
      FILE* in
    , FILE* out
    );

    static
    void show_help(FILE* stm);
    static
    void show_version(FILE* stm);

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .
      if(clasp_checkValue(args, 1, &outName, NULL, &arg))
      {
        . . .
      }

      return slsw(
        in
      , out
      );
    }

    static
    int slsw(
      FILE* in
    , FILE* out
    )
    {
      int ch;

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }
    . . .

Note that every code change since Step 1 is code that can, and probably should, be generated by a wizard.

Step 10 - Windows-compatible Swapping

So far, we've spent most of our time looking at command-line handling, and haven't taken a look at all at the slash-swapping action logic itself. One of the first things that jumps out is that it is UNIX-specific: it assumes that backslashes are "wrong" and forward slashes are "right". (To be sure, this is true, but it's not the view of the entire computational world.) We can address this within the newly separated slsw() function, as shown in Listing 10.

Listing 10

    static
    int slsw(
      FILE* in
    , FILE* out
    )
    {
    #if defined(_WIN32)
    # define  SLSW_AMBIENT_CHAR_  '\\'
    # define  SLSW_ALT_CHAR_      '/'
    #elif defined(UNIX) || \
          defined(unix)
    # define  SLSW_AMBIENT_CHAR_  '/'
    # define  SLSW_ALT_CHAR_      '\\'
    #else
    # error Operating-system not discriminated
    #endif

      char const srch = SLSW_ALT_CHAR_;
      char const repl = SLSW_AMBIENT_CHAR_;
      int ch;

      for(; EOF != (ch = fgetc(in)); )
      {
        if(srch == ch)
        {
          ch = repl;
        }
        fputc(ch, out);
      }

      return EXIT_SUCCESS;
    }

Step 11 - Implementing "--reverse" Flag

Having got to an implementation of slsw() that works correctly on both UNIX and Windows, it's now time to provide the more sophisticated behaviour that is provided by the extant slsw tool: to be able to "reverse" the ambient swapping (something that is very useful when writing on one operating system about coding on another, as it happens). We'll support this by adding support for a "--reverse" flag, as shown in Listing 11.

Listing 11

    . . .

    static clasp_alias_t aliases[] = 
    {
      CLASP_FLAG("-r", "--reverse", "reverses the swapping from non-ambient=>ambient to ambient=>non-ambient"),

      CLASP_FLAG(NULL, "--help", "invokes this help and terminates"),
      CLASP_FLAG(NULL, "--version", "displays version and terminates"),

      CLASP_ALIAS_ARRAY_TERMINATOR
    };

    static
    int slsw(
      FILE* in
    , FILE* out
    , int   reverse
    );

    . . .

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .
      clasp_argument_t const* arg;
      int reverse = 0;

      if(clasp_flagIsSpecified(args, "--help"))
      . . .
      if(clasp_flagIsSpecified(args, "--version"))
      . . .

      reverse = clasp_flagIsSpecified(args, "--reverse");

      if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0))
      . . .

      if(clasp_checkValue(args, 1, &outName, NULL, &arg))
      . . .

      return slsw(
        in
      , out
      , reverse
      );
    }

    static
    int slsw(
      FILE* in
    , FILE* out
    , int   reverse
    )
    {
    #if defined(_WIN32)
    # define  SLSW_AMBIENT_CHAR_  '\\'
    # define  SLSW_ALT_CHAR_      '/'
    #elif defined(UNIX) || \
          defined(unix)
    # define  SLSW_AMBIENT_CHAR_  '/'
    # define  SLSW_ALT_CHAR_      '\\'
    #else
    # error Operating-system not discriminated
    #endif

      char const srch = reverse ? SLSW_AMBIENT_CHAR_ : SLSW_ALT_CHAR_;
      char const repl = reverse ? SLSW_ALT_CHAR_ : SLSW_AMBIENT_CHAR_;
      int ch;

      for(; EOF != (ch = fgetc(in)); )
      . . .

      return EXIT_SUCCESS;
    }

Step 12 - Added "--reverse" Option, and Sophisticated Behaviour

Of course, once you start to add sophistication, it's often tempting to add more. We can readily imagine a future version of such a tool needing to expand its abilities as illustrated by Step 12: in addition to the existing 'ambient' and 'reverse' modes, it now also supports 'backward' and 'forward' slashes, and inverting of whatever is encountered, all via the new "--mode" option. Each mode has a flag alias, and the "--reverse" flag becomes a flag alias for backwards-compatibility.

Listing 12

    . . .

    static clasp_alias_t aliases[] = 
    {
      CLASP_OPTION(
                       "-m"
                     , "--mode"
                     , "specifies the mode for slash swapping. "
                       "'ambient' changes non-ambient slashes to ambient slashes, and is the default if mode not specified; "
                       "'back' changes slashes to backslashes; "
                       "'forward' changes backslashes to slashes; "
                       "'invert' inverts all slashes; "
                       "'reverse' does the opposite of 'ambient'."
                     , "|ambient|back|forward|invert|reverse"
                     ),
      CLASP_OPTION_ALIAS("-a", "--mode=ambient"),
      CLASP_OPTION_ALIAS("-b", "--mode=back"),
      CLASP_OPTION_ALIAS("-f", "--mode=forward"),
      CLASP_OPTION_ALIAS("-i", "--mode=invert"),
      CLASP_OPTION_ALIAS("-r", "--mode=reverse"),
    #ifndef SLSW_NO_BACKWARDS_COMPATIBILITY
      CLASP_OPTION_ALIAS("--reverse", "--mode=reverse"), /* backwards compatibility */
    #endif /* SLSW_NO_BACKWARDS_COMPATIBILITY */

      CLASP_FLAG(NULL, "--help", "invokes this help and terminates"),
      CLASP_FLAG(NULL, "--version", "displays version and terminates"),

      CLASP_ALIAS_ARRAY_TERMINATOR
    };


    /* detect operating system */
    #if defined(_WIN32)
    # define        SLSW_OS_IS_WINDOWS
    #elif defined(UNIX) || \
          defined(unix)
    # define        SLSW_OS_IS_UNIX
    #else
    # error Operating-system not discriminated
    #endif

    enum slsw_mode_t
    {
      /* pseudo-modes */
      SLSW_MODE_AMBIENT = 0,
      SLSW_MODE_REVERSE,

      /* real modes */
      SLSW_MODE_INVERT,
    #ifdef SLSW_OS_IS_UNIX
      SLSW_MODE_B2F     = SLSW_MODE_AMBIENT,
      SLSW_MODE_F2B     = SLSW_MODE_REVERSE,
    #endif
    #ifdef SLSW_OS_IS_WINDOWS
      SLSW_MODE_B2F     = SLSW_MODE_REVERSE,
      SLSW_MODE_F2B     = SLSW_MODE_AMBIENT,
    #endif

      SLSW_MAX_VALUE
    };
    typedef enum slsw_mode_t slsw_mode_t;


    static
    int slsw(
      FILE*       in
    , FILE*       out
    , slsw_mode_t mode
    );

    . . .

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .
      clasp_argument_t const* arg;
      slsw_mode_t     mode = SLSW_MODE_AMBIENT;

      if(clasp_flagIsSpecified(args, "--help"))
      . . .
      if(clasp_flagIsSpecified(args, "--version"))
      . . .

      arg = clasp_findFlagOrOption(args, "--mode", 0);
      if(NULL != arg)
      {
        if(0 == strcmp(arg->value.ptr, "ambient")) { mode = SLSW_MODE_AMBIENT; }
        else
        if(0 == strcmp(arg->value.ptr, "back")) { mode = SLSW_MODE_F2B; }
        else
        if(0 == strcmp(arg->value.ptr, "forward")) { mode = SLSW_MODE_B2F; }
        else
        if(0 == strcmp(arg->value.ptr, "invert")) { mode = SLSW_MODE_INVERT; }
        else
        if(0 == strcmp(arg->value.ptr, "reverse")) { mode = SLSW_MODE_REVERSE; }
        else
        {
          fprintf(stderr, "slsw: invalid mode specified\n");
          return EXIT_FAILURE;
        }
      }

      if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0))
      . . .

      if(clasp_checkValue(args, 1, &outName, NULL, &arg))
      . . .

      return slsw(
        in
      , out
      , mode
      );
    }

    static
    int slsw(
      FILE*       in
    , FILE*       out
    , slsw_mode_t mode
    )
    {
    #ifdef SLSW_OS_IS_WINDOWS
    # define  SLSW_AMBIENT_CHAR_  '\\'
    # define  SLSW_ALT_CHAR_      '/'
    #endif
    #ifdef SLSW_OS_IS_UNIX
    # define  SLSW_AMBIENT_CHAR_  '/'
    # define  SLSW_ALT_CHAR_      '\\'
    #endif

      int ch;

      for(; EOF != (ch = fgetc(in)); )
      {
        switch(ch)
        {
          case  SLSW_AMBIENT_CHAR_:
            if(SLSW_MODE_AMBIENT != mode)
            {
              ch = SLSW_ALT_CHAR_;
            }
            break;
          case  SLSW_ALT_CHAR_:
            if(SLSW_MODE_REVERSE != mode)
            {
              ch = SLSW_AMBIENT_CHAR_;
            }
            break;
        }
        fputc(ch, out);
      }

      return EXIT_SUCCESS;
    }

    . . .

I leave as an exercise for the reader an examination of the new action logic - I do rather like the clever interplay 'twixt enumerator values and switch, but I'm probably kidding myself - and instead point out how the revisibility is pretty good for such a large change. In and of itself it doesn't make the code good, but it does help to follow what's happening, which we might presume is an indirect aid to software quality.

Step 13 - Added Precondition Enforcements to slsw

Having separated the action logic into a separate function, it behoves us to enforce precondition enforcements. The precondition is simple: neither in nor out can be NULL. It is enforced by the standard function-like macro assert(), introduced by <assert.h>. For brevity, no listing is shown of the changes.

Step 14 - Handling DRY SPOT Violations

Now to tackle another of the issues that are important to program anatomy: DRY SPOT violations [PragProg, IC++, AoUP]! Specifically, there are four outright violations, and one somewhat subtle one. The outright violations are the multiple uses of literals - "slsw", 0, 1, and 15 (now 16) - for specifying program name and version numbers. The subtle one is the widespread further use of the string "slsw" within various longer literal strings (used for contingent reports). If we choose to change the program name in the future, we'd better hope to be using a good search-replace tool. Better to DRY it now, and have a SPOT.

This was easy to achieve in this case (see Listing 13) via the four object-like macros PROGRAM_NAME, PROGRAM_VER_MAJOR, PROGRAM_VER_MINOR, and PROGRAM_VER_REVISION. That ease is, in part, due to the simplicity of slsw: it is written in C; it is a standalone tool; it does not (yet) use diagnostic logging; the version/usage information is statically determined.

Listing 13

    . . .
    #include <string.h>

    #define PROGRAM_NAME          "slsw"
    #define PROGRAM_VER_MAJOR     0
    #define PROGRAM_VER_MINOR     1
    #define PROGRAM_VER_REVISION  16

    static clasp_alias_t aliases[] =
    . . .


    . . .

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .

      arg = clasp_findFlagOrOption(args, "--mode", 0);
      if(NULL != arg)
      {
        if(0 == strcmp(arg->value.ptr, "ambient")) { mode = SLSW_MODE_AMBIENT; }
        . . .
        else
        {
          fprintf(
            stderr
          , "%s: invalid mode specified\n"
          , PROGRAM_NAME
          );
          return EXIT_FAILURE;
        }
      }

      if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0))
      {
        fprintf(
          stderr
        , "%s: unrecognised argument: %s\n"
        , PROGRAM_NAME
        , arg->givenName.ptr
        );

        return EXIT_FAILURE;
      }

      if(clasp_checkValue(args, 0, &inName, NULL, &arg))
      {
        if(0 == arg->givenName.len)
        {
          if(NULL == (in = fopen(inName, "r")))
          {
            int const e = errno;
            fprintf(
              stderr
            , "%s: could not open '%s' for read access: %s\n"
            , PROGRAM_NAME
            , inName
            , strerror(e)
            );
            return EXIT_FAILURE;
          }
        }
      }
      if(clasp_checkValue(args, 1, &outName, NULL, &arg))
      {
        if(0 == arg->givenName.len)
        {
          if(NULL == (out = fopen(outName, "w")))
          {
            int const e = errno;
            fprintf(
              stderr
            , "%s: could not open '%s' for write access: %s\n"
            , PROGRAM_NAME
            , outName
            , strerror(e)
            );
            return EXIT_FAILURE;
          }
        }
      }

      . . .
    }

    static
    int slsw(
      FILE*       in
    , FILE*       out
    , slsw_mode_t mode
    )
    . . .

    int main(int argc, char** argv)
    {
      int const clflags = 0
                        | CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE
                        ;

      return clasp_main_invoke(argc, argv, clasp_main, PROGRAM_NAME, aliases, clflags, NULL);
    }

    static
    void show_help(FILE* stm)
    {
      clasp_showUsage(
        NULL
      , aliases
      , PROGRAM_NAME
      , "Synesis Software SystemTools (http://synesis.com.au/systools)"
      , NULL
      , "Swaps slashes in text"
      , PROGRAM_NAME " [ ... options ... ] [<input-file>|-] [<output-file>|-]"
      , PROGRAM_VER_MAJOR
      , PROGRAM_VER_MINOR
      , PROGRAM_VER_REVISION
      , clasp_showHeaderByFILE
      , clasp_showBodyByFILE
      , stm
      , 0
      , 76
      , -4
      , 1
      );
    }

    static
    void show_version(FILE* stm)
    {
      clasp_showVersion(
        NULL
      , PROGRAM_NAME
      , PROGRAM_VER_MAJOR
      , PROGRAM_VER_MINOR
      , PROGRAM_VER_REVISION
      , clasp_showVersionByFILE
      , stm
      , 0
      );
    }

Step 15 - Splitting into library & main: "Program Design is Library Design"

Over the years, I've misremembered a Bjarne Stroustrup quote of longstanding. With the assistance of the good folks on ACCU General, I've now ascertained that the original quote is "language design is library design" (and there's also one that says "library design is language design", for good measure), which I realise now doesn't really capture what I want to say here.

Instead, I'm starting my own quote about a form of good practice in program design: program design is library design. You all have my express permission to propagate this to the end of time (with due attribution ;-).

Let's now split up the code we've arrived at thus far along the lines of decision vs action logic, giving three files: slsw.h, slsw.c, and main.c.

slsw.h contains the following:

a #include for stdio.h (because slsw() references the FILE type);
operating system discrimination (because slsw_mode_t requires it);
definition of the slsw_mode_t enumeration; and
declaration of the slsw() function.

slsw.c contains the following:

required #includes, starting with "slsw.h"; and
implementation of the slsw() function.

main.c contains the following (as it did previously):

required #includes: "slsw.h"; then CLASP headers; then standard headers; and;
SPOTs for PROGRAM_NAME, etc;
aliases array;
forward declarations for show_help() and show_version();
clasp_main();
main(); and
implementations of show_help() and show_version().

I hope it's now clear that the slsw.h (declarations) and slsw.c (implementation) together form a library, which can be used independently of any notion of CLI (or any other particular) execution context. As well as being used within the slsw program, the library may be reused by other programs (e.g. slswgui), and, of particular importance for software quality, in automated test harnesses.

Step 16 - Fixing up Coupling and Semantics of slsw()

Now we've abstracted slsw() into a separate file, its coupling - both physical and semantic - to the command-line is evident: it relies on stdlib.h and its return value is EXIT_SUCCESS (or, by implication, EXIT_FAILURE). This is wrong.

We can fix this very easily, simply by changing it to return 0 for success, and non-0 for failure, relying on errno (as set by fgetc() or fputc()) for more detailed failure information, as shown in Listings 14-16. As you may know, gentle readers, the standard requires that a program return value of EXIT_SUCCESS is treated as equivalent to 0, and that EXIT_FAILURE is not 0. So we've cunningly done nothing to reduce backwards-compatibility while reducing coupling. Which is nice.

Listing 14

    . . .
    typedef enum slsw_mode_t slsw_mode_t;

    /** Swaps slashes in \c in to \c out, according to \c mode
     *
     * \retval 0 The function succeeded
     * \retval !0 The function failed. errno will indicate reason
     *
     * \pre (NULL != in)
     * \pre (NULL != out)
     */
    int slsw(
      FILE*       in
    , FILE*       out
    , slsw_mode_t mode
    );

Listing 15

    int slsw(
      FILE*       in
    , FILE*       out
    , slsw_mode_t mode
    )
    {
    #ifdef SLSW_OS_IS_WINDOWS
    # define  SLSW_AMBIENT_CHAR_  '\\'
    # define  SLSW_ALT_CHAR_      '/'
    #endif
    #ifdef SLSW_OS_IS_UNIX
    # define  SLSW_AMBIENT_CHAR_  '/'
    # define  SLSW_ALT_CHAR_      '\\'
    #endif

      int ch;

      assert(NULL != in);
      assert(NULL != out);

      for(; EOF != (ch = fgetc(in)); )
      {
        switch(ch)
        {
          case  SLSW_AMBIENT_CHAR_:
            if(SLSW_MODE_AMBIENT != mode)
            {
              ch = SLSW_ALT_CHAR_;
            }
            break;
          case  SLSW_ALT_CHAR_:
            if(SLSW_MODE_REVERSE != mode)
            {
              ch = SLSW_AMBIENT_CHAR_;
            }
            break;
        }
        if(ch != fputc(ch, out))
        {
          return -1;
        }
      }

      if(ferror(in))
      {
        return -1;
      }

      return 0;
    }

Listing 16

    static
    int clasp_main(clasp_arguments_t const* args)
    {
      . . .

      if(clasp_checkValue(args, 0, &inName, NULL, &arg))
      . . .
      if(clasp_checkValue(args, 1, &outName, NULL, &arg))
      . . .

      if(0 == slsw(
        in
      , out
      , mode
      ))
      {
        return EXIT_SUCCESS;
      }
      else
      {
        int const e = errno;

        fprintf(
          stderr
        , "%s: failed to complete slash-swapping: %s\n"
        , PROGRAM_NAME
        , strerror(e)
        );

        return EXIT_FAILURE;
      }
    }

Note that slsw() is simple enough that we don't have to do diagnostic logging and contingent reporting here (though even in this we lose the knowledge of whether it's input (fgetc()) or output (fputc()) that fails. More complex action-logic components may have to use more complex failure reporting to their decision-logic callers, including process/thread-global error state variables (a la errno), return codes, exceptions, callbacks, diagnostic logging and contingent reports.

Summary

This article has examined the incremental development of a simple but real program written in C as a basis for analysis of some of the issues pertaining to CLI program anatomy. In particular, it has discussed the delineation of program implementation into decision logic, action logic, and support logic, and demonstrated how separation of the code on such lines brings several benefits: separation of the action logic into a library increases clarity, scope for reuse, testability, and modularity. This principle of program design is library design will be a constant feature of the series.

As a by-product of this exercise, the article has also provided a simple example of function layering: simplifying a large and complex main() by abstracting out the boilerplate support logic in the form of a function to which we pass the address of a smaller, specific "main". Subsequent articles will consider how other services can be initialised in a similar manner, enabling access to sophisticated (albeit uninteresting) functionality with minimal intrusion into the code, preferably in a way that can be wizard-generated.

Finally, the article described the identification and elimination of sources of repetition in the program name and version numbers. In the simple case presented, these "identity attributes" were defined as pre-processor object-like macros. Subsequent articles will consider alternatives, reflecting requirements of language and good practice as well as considering how such attributes may be obtained dynamically (such as from a program's Windows version resource), and how (and when) they must be defined to interact reliably with the phases of "main()"s and various support services.

In the next article, I will turn to the subject of CLI programs written in C++, and consider the advantages and disadvantages as compared to C: by then, all being well, I will have kept my writing momentum up and completed the next Quality Matters instalment - the third in the series on C++ exceptions - for the next issue of Overload and will be able to draw on that also, and so keep down the length.

Further issues of interest to be covered in the next article will include some/all of the following:

Character encodings - multibyte and/or widestring;
Removable Diagnostic Measures - how to facilitate high quality software without undue coupling;
Names - for identity attributes, for namespaces, for files, for project-related directories;
Directories - where to place the decision logic, action logic, the support logic, and the project files;
Testing - how much can be auto-generated by the wizard; and
Function Layering to the Max!

Finally, before the next article in the series I intend to complete the first wizard rewrite, encapsulating all the issues discussed herein, and hope to be able to report back on being able to generate sophisticated, modular, program projects according to the principles and techniques presented thus far. We might even have some downloadable goodies!

Acknowledgements

Many thanks go to Chris Oldwood and Garth Lancaster for helping me despite what has become typically eleventh-hour preparation of the draft. Usual thanks/apologies go to Steve Love. I'd promise to write the next article in plenty of time, but he knows I'd find some reason to break it. Ah well.

Author Bio

Matthew is a software development consultant and trainer for Synesis Software who helps clients to build high-performance software that does not break, and an author of articles and books that attempt to do the same. He can be contacted at matthew@synesis.com.au.

References

[AoUP] The Art of UNIX Programming, Eric S. Raymond, Addison-Wesley, 2003

[CLASP] CLASP is an open-source library for Command-Line Argument Sorting and Parsing, available via the Subversion respository at http://sourceforge.net/projects/systemtools

[CLASP-2011] An Introduction to CLASP, Matthew Wilson, CVu, volume 23 number 6, January 2012

[EAM] http://c2.com/cgi/wiki?ExecuteAroundMethod

[FEATHERS] Working Effectively with Legacy Code, Michael Feathers, Pearson, 2004

[IC++] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004

[PragProg] The Pragmatic Programmer, Andy Hunt and Dave Thomas, Addison-Wesley, 1999.

[PG2L] A Practical Guide to Linux, Mark G. Sobell, Prentice Hall, 2005

[QM-5] Quality Matters, Part 5: Exceptions: The Worst Form of Error Handling, Apart From All The Others, Matthew Wilson, Overload 98, August 2010