Software Anatomies, part 1: Anatomy of a CLI Program Written in C
by Matthew Wilson
This article first published in ACCU's CVu, September 2012. All content copyright Matthew Wilson 2012.
Abstract
This article, the first in a series looking at software anatomy, examines the
structure of a command-line program as it grows from a small example
main()
into a releasable modular project, noting aspects of
software quality with a focus on boilerplate and dependency. The principle
practical aim of the series is to identify rules for (re-)building
code-generation wizards to last me for the next decade.
In this article I'm going to begin by discussing the motivation for the articles (and related programming activities) and then work through the development an exemplar command-line program, with which I hope to identify a number of issues to be elucidated in subsequent articles.
Introduction
I want to stop thinking! More precisely, I want to stop thinking about basic things. More accurately, I want to stop thinking about fundamental things.
During our 2011-12 Christmas trip back to Blighty I had the singular pleasure of spending 90 conversationally-dense minutes in a London pub with Chris Oldwood and Steve Love talking about, as Steve coined it "fundamental, not basic" issues of programming. Steve related tales of former colleagues' frustrations with having to think about "basic things", to which he offers the above apposite correction. Thinking about fundamental things is not a waste of time. The fact is software development is still a very young field, as those of us who try hard at our practice know all too well - the bulk of the community do not yet even use basic terms such as "error" properly or definitively [QM-5]!
The more software I write, the more I am concerned with fundamental things. The trains of thought, and the concomitant changes to my practice, that prompted me to start (and soon pick up again) my Quality Matters column, mean that I can no longer develop software in quite the same ways as before. I must perforce consider quality, and most particularly failure, a lot more - diagnostics, contracts, testability, and so on - when I write even simple programs.
But there's a limit to how interesting such concerns can be, and how productive they can let one be, and I've reached it, at least in one area of programming: It's time for me to start drawing some lines in the sand when it comes to command-line interface (CLI) programming. I've been writing CLI programs in C for 25 years (and in other languages for considerable times too). I've been using program-generating wizards for almost twenty years. But these tools are well past use-by-date, not only in terms of the environments within which they run, but also regarding the state of the art of the language(s), libraries, and (good) practices that they employ.
Now I want to identify definitively the "anatomies" of CLI programs, solidify them in libraries and program generating wizards, and just crack on. I also seek, wherever possible, to identify ways in which the boilerplate aspects of programs can be abstracted without detracting from flexibility or transparency, such that their visual impact can be hidden/diminished, thereby increasing the transparency of program-specific code (and, in a real sense, increasing the average transparency of all the code that I write).
Ideally, I'd like to be able to begin this series of articles with an oracular stipulation of the definitive taxonomy of software anatomies and a presentation of matrices of program-area o module-type o language, then distil them in subsequent instalments for particular programming areas in a simple fait accompli. Certainly I do have some strong feelings in this area, e.g. where diagnostic facilities should be located in dependency graphs. Problem is, I don't (yet) know them all: that's what I'm hoping to identify as we go.
So, where to start? Because I've done a tremendous amount of CLI programming (in C, C++, C#, and Ruby) this last couple of years, I do have strong feelings about CLI program anatomies, and much (and varied) experience to back them up. So, the plan is to work bottom up, starting with CLI programs written in C. Naturally, I hope (and request!) to receive plenty of feedback to these articles from you gentle readers, since I cannot expect to have captured all good and bad practices, even in those areas in which I'm most experienced.
Anatomy
First I should explain the use of the term anatomy. Simply, I'm interested in both logical structure and physical structure, so I chose anatomy as an umbrella term, rather than having to constantly refer to "both logical and physical structure". Hopefully it'll catch on. :-)
For example, it's useful that the core elements of a CLI program be (more)
testable, particularly in automated test harnesses. Both physical and logical
dependencies impact on this. If the core program functionality is located
within the same physical file as main()
, that increases the
difficulties in compiling and linking it into a test harness - we'll be forced
to use some Feathers-like pre-processor manipulation
[FEATHERS]. Conversely,
if the core logic depends on specific third-party general services libraries -
e.g. diagnostic logging, contract enforcement, database manipulation, etc. -
these will significantly increase the difficulty and scope of the testing.
I hope to elucidate the impacts of both aspects of program anatomy as this series progresses.
Logical Layers of Components/Services
Another aspect in which I'm very interested is the layering of the logical dependencies. Considering just a CLI program, we can identify a number of components/services that may be found:
- Operating System services;
- Language Runtime services;
- Language Standard Library components;
- Diagnostic services (including diagnostic logging, runtime contract enforcement, code coverage, memory-leak detection);
- Command Line Parsing component; and
- other 3rd-party library services/components.
and, of course:
- all the programmer-written code: processing command-line arguments; decide what to do; do it.
It seems pretty uncontentious to claim that Operating System services must be ready and available to Language Runtime services, and that Language Runtime services must be ready and available to all other components/services. However, I think there will be some equivocation on what the dependency graph looks like beyond that point, and that will also depend on language and program type. All I will stipulate for now is Figure 1; I'll revisit this graph many times in the coming series.
Example Program: slsw
The only way I know to go about this is to use an example, starting simple and building up to what I consider to be a releasable standard, or until I run out of stream or space (or time!). After fluffing around with several different programs I've settled on rewriting an existing tool slsw (slash swap), which, er, swaps slashes in its input to its output.
For this first article there are several simplifications:
- no diagnostic logging;
-
most-basic contract enforcement, using
assert()
; and; - assumption that it is a standalone program. In reality it is one of a suite of related, and similarly implemented tools, the significance of which, to program generation and coding practice, will be discussed at a later time.
Step 1 - Initial Version
The first step is shown in Listing 1. I trust it's largely self-explanatory;
the use of stdin
and stdout
via the in
and out
variables is a minor sop to
revisibility (see sidebar)
in light of what's to come.
Listing 1
#include <stdio.h> #include <stdlib.h> int main(int argc, char** argv) { FILE* in = stdin; FILE* out = stdout; int ch; for(; EOF != (ch = fgetc(in)); ) { if('\\' == ch) { ch = '/'; } fputc(ch, out); } return EXIT_SUCCESS; }
The implementation is all very well if:
- you only want to read from standard input and write to standard output;
- you only want to swap backslashes to slashes; and
- you don't have to ask the program how to use it.
Step 2 - read from file/stdin, write to file/stdout
Let's deal with the first of these issues, limitation to using standard input and standard output. While UNIX-like filter programs [PG2L] are most often used in this manner, it is also useful, and therefore common practice, to allow filenames to be specified for the input and/or output, as in:
$ slsw input.txt output.txt
Furthermore, in order to cope with the situation of wanting to write to a named file while still reading from standard input, it's also common to interpret a filename of "-" as meaning read from (or write to) standard input (or standard output); this was discussed in more detail in my article about CLASP [CLASP-2011].
Without resorting to use of any other libraries, we can support almost all of
this properly by changing the two lines declaring and assigning to our
FILE*
variables, as shown in Listing 2. (NOTE: for reasons of
brevity in this case I do not test and close FILE*
variables, since streams are closed by the runtime when the program exits; it
is best practice to do so explicitly in general.)
Listing 2
#include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char** argv) { FILE* in = (argc < 2 || 0 == strcmp("-", argv[1])) ? stdin : fopen(argv[1], "r"); FILE* out = (argc < 3 || 0 == strcmp("-", argv[2])) ? stdout : fopen(argv[2], "w"); int ch; for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; }
The reason it's only almost proper is because the command-line flag
"--"
, by convention, is used to specify that all subsequent
arguments be treated as a value, regardless of whether they begin (or consist
solely of) a hyphen. In the case of slsw, this would allow for specification
of a file named "-"
. I'm ignoring this, because the issue was
dealt with in the CLASP article, and, as you've probably guessed, I'm going
to have to plug CLASP in pretty soon.
The code in Listing 2 works well in the normative case. But it's opaque, and not something anyone would write with any justified pride. Much worse, it (mis-)handles non-normative behaviour by undefined behaviour: passing the name of an unreadable input file causes a segmentation fault (on Mac OS-X, and likely on other systems also). Yikes!
Clearly, we have to check for failure to open named files.
Step 3 - Failure handling
For reasons of pedagogy and revisibility alone, I fix the non-normative issue in the manner shown in Listing 3; if this were to be the (near-)final implementation of the program, I would instead process detecting and processing the path arguments together, and much more clearly. Thankfully, I don't have to, because that's already too much silliness and wasted effort in command-line argument processing. It's time to call in CLASP.
Listing 3
#include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char** argv) { char const* inName = NULL; char const* outName = NULL; FILE* in = (argc < 2 || 0 == strcmp("-", inName = argv[1])) ? stdin : fopen(inName, "r"); FILE* out = (argc < 3 || 0 == strcmp("-", outName = argv[2])) ? stdout : fopen(outName, "w"); int ch; if(NULL == in) { int const e = errno; fprintf(stderr, "slsw: could not open '%s' for read access: %s\n", inName, strerror(e)); return EXIT_FAILURE; } if(NULL == out) { int const e = errno; fprintf(stderr, "slsw: could not open '%s' for write access: %s\n", outName, strerror(e)); return EXIT_FAILURE; } for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; }
Note that, according to UNIX convention, the non-normative output - that which occurs when the program is not achieving its primary purpose: in this case the contingent reports in the unrecoverable condition handlers just added - is marked with the program name, and the normative output is not (as it would stop the program being of any use as a filter).
Step 4 - using CLASP (longhand)
Plugging CLASP straight into main()
, giving Step 4 (see
Listing 4), results in a file of nearly triple the size. (For sure, it's a lot
more transparent than the mess of Step 3, but still ...). As touched upon in
[CLASP-2011],
it's almost never the right thing to plug it straight into main()
along with your program logic. This issue of where different parts of the
program should go is one of the main themes of this article, which I'll go
into in a lot more detail later. For now, just go along with the incremental
steps, if you don't mind.
Listing 4
#include <systemtools/clasp/clasp.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static clasp_alias_t aliases[] = { CLASP_ALIAS_ARRAY_TERMINATOR }; int main(int argc, char** argv) { clasp_arguments_t const* args; int const cr = clasp_parseArguments( CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE , argc , argv , aliases , NULL , &args ); if(0 != cr) { fprintf(stderr, "slsw: failed to parse command-line: %s\n", strerror(cr)); return EXIT_FAILURE; } else { char const* inName = NULL; char const* outName = NULL; FILE* in = stdin; FILE* out = stdout; int ch; clasp_argument_t const* arg; if(clasp_checkValue(args, 0, &inName, NULL, &arg)) { if(0 == arg->givenName.len) { if(NULL == (in = fopen(inName, "r"))) { int const e = errno; fprintf(stderr, "slsw: could not open '%s' for read access: %s\n", inName, strerror(e)); clasp_releaseArguments(args); return EXIT_FAILURE; } } } if(clasp_checkValue(args, 1, &outName, NULL, &arg)) { if(0 == arg->givenName.len) { if(NULL == (out = fopen(outName, "w"))) { int const e = errno; fprintf(stderr, "slsw: could not open '%s' for write access: %s\n", outName, strerror(e)); clasp_releaseArguments(args); return EXIT_FAILURE; } } } for(; EOF != (ch = fgetc(in)); ) . . . clasp_releaseArguments(args); return EXIT_SUCCESS; } }
Hopefully it's all pretty self-evident in light of the CLASP article
[CLASP-2011],
with the probable exception of the check on givenName
's length:
this uses a feature of CLASP whereby "-"
arguments that are not
preceeded by "--"
and thus would usually be interpreted as flags
are instead interpreted as values if the
CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE
parsing flag is specified, and identified as such by having non-empty
givenName
(and resolvedName
, for that matter), which
no other values ever have; only when the value is present and is not
"-"
do we open a file, rather than accept the built-in stream.
(Note to self: perhaps some more transparency-engendering macro/function - e.g.
clasp_valueIsSingleHyphen()
- might be a good addition to the next
release.)
Step 5 - using CLASP.Main
Although, in my opinion at least, the code in Step 4 is much improved in
transparency - as well as actually properly handling all the required
command-line permutations, don't forget - it is actually a good deal more
verbose. Furthermore, although not shown in the listing, in the actual code I
made two mistakes. The first was in comparing cr
less than 0,
rather than not equal 0: easy to do, hard to spot, even harder to test
against (since
clasp_parseArgument()
failures are exceedingly rare
[CLASP-2011]).
The second was in omitting the first two calls to
clasp_releaseArguments()
: again, easy to do, and hard to spot.
Thankfully, CLASP.Main, a CLASP extension library, provides a way to avoid (mis-)writing this boilerplate from program to program, via initialisation-function layering (via the ExecuteAroundMethod pattern [EAM]), obviating both my real mistakes. Applying it gives Step 5, as shown in the differential Listing 5.
Listing 5
#include <systemtools/clasp/clasp.h> #include <systemtools/clasp/main.h> #include <errno.h> . . . static clasp_alias_t aliases[] = . . . static int clasp_main(clasp_arguments_t const* args) { char const* inName = NULL; . . . if(clasp_checkValue(args, 0, &inName, NULL, &arg)) . . . for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; } int main(int argc, char** argv) { int const clflags = 0 | CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE ; return clasp_main_invoke(argc, argv, clasp_main, "slsw", aliases, clflags, NULL); }
Although the code is a lot clearer, we've still not much reduced the number of source lines. Now is a good time for me to foreshadow one of the themes of this study: considering how much of program source is dedicated to (uninteresting) boilerplate.
The still-small actual "doing" logic - the for-loop - is drowning in a much bigger function steeped in "deciding" logic - the command-line handling - and support/boilerplate logic. The delineations between the deciding and the doing, and the interesting and the uninteresting, are points of interest in program anatomy.
(NOTE: once again, revisibility influences the declarations of
clflag
, as it allows me to add/remove flags in a manner -
one-per-line - that is isolated and unambiguous. This tactic I employ in real
work.)
Step 6 - Implementing "--help" Flag
It's time to start handling some flags, starting with the conventional
"--help"
flag: display usage information and quit. Listing 6
shows the differential changes for Step 6.
Listing 6
. . . static clasp_alias_t aliases[] = { CLASP_FLAG(NULL, "--help", "invokes this help and terminates"), CLASP_ALIAS_ARRAY_TERMINATOR }; static int clasp_main(clasp_arguments_t const* args) { . . . clasp_argument_t const* arg; if(clasp_flagIsSpecified(args, "--help")) { clasp_showUsage( NULL , aliases , "slsw" , "Synesis Software SystemTools (http://synesis.com.au/systools)" , NULL , "Swaps slashes in text" , "slsw [ ... options ... ] [<input-file>|-] [<output-file>|-]" , 0 , 1 , 6 , clasp_showHeaderByFILE , clasp_showBodyByFILE , stdout , 0 , 76 , -4 , 1 ); return EXIT_SUCCESS; } if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0)) { fprintf( stderr , "slsw: unrecognised argument: %s\n" , arg->givenName.ptr ); return EXIT_FAILURE; } if(clasp_checkValue(args, 0, &inName, NULL, &arg)) . . . for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; } . . .
Disregarding the somewhat alarming use of magic numbers, and assuming your
willingness to read the CLASP docs for clasp_showUsage()
, this
should be reasonably easy to understand. But none of it's interesting: nothing
more than more boilerplate.
I've also corrected an earlier oversight: So far, passing an unrecognised flag
to the program will be treated as a file name (Steps 1-3) or silently ignored
(Steps 4-5), neither of which is appropriate. We need to employ
clasp_reportUnusedFlagsAndOptions()
(see
[CLASP-2011]),
as shown, after all known flags/options are processed explicitly.
Packaging Concerns
As of Step 6, only 8 of the 100+ source code lines are actually to do with the business of swapping slashes! Even though that's probably a smaller ratio than would be the case in most programs of greater complexity of purpose, it's still not unrepresentatively low. From my experience in writing CLI programs, it's invariably the case that the wood is obscured by the trees. And this brings me back to two of my areas of interest: coupling and code generation.
As I mentioned in the introduction, I want to be able to start to think less about fundamental issues upon which there is general agreement, and to update my long-in-the-tooth code generation wizards accordingly, and, as a consequence of both, write better software more rapidly.
I've been threatening Steve with writing a series of articles on program anatomy for an embarrassingly long time now, and despite my procrastinations (righteous and otherwise) I have been thinking about the subject a lot. Consequently, I've come to the position that all CLI program logic that is "written" by the author - i.e. is not part of standard, system, or third-party libraries; this includes code that might be wizard-generated at the author's behest - can be considered to comprise the following behavioural/anatomical groups:
- Decision logic : the code that works out what needs to be done and which component(s) will do it;
- Action logic : the code that does the work deemed necessary by the decision-logic; and
- Support logic : all the other stuff, including command-line parsing, diagnostic logging, and so forth.
Of course, now I've said it, it looks blindingly obvious, and not the least original. Furthermore, it's very likely to apply, albeit with differences, to other types of link-units; I've just not given them as much thought yet, so don't want to jump the gun.
But the point I want to proselytise in this article (and the others that'll look at different types of link-units and different languages) is that it's not just a thinking taxonomy: it's a doing one.
Let's consider again our little slsw program. As of Step 6 we can divide the code into the three groups as follows:
-
Decision:
- Detection of the "--help" flag, and invocation of third-party (CLASP) library functions to respond; or
- Detection of 0-2 command-line values, and invocation of third-party (CLASP) and standard library functions to open named files (and deal with failure to do so);
-
Invocation of the
fgetc()
/fputc()
for
-loop; and -
Issuing of return value
EXIT_SUCCESS
.
-
Action:
-
The
fgetc()
/fputc()
for
-loop; and -
The invocation of
clasp_showUsage()
.
-
The
-
Support:
-
All six
#include
s; -
Definition of
aliases
array; and -
All of
main()
.
-
All six
One could argue that detecting and handling the "--help" flag could, by virtue of its conventional nature, be classed as support logic, but I don't think that's helpful. Rather, it's decision and action logic that can be wizard generated.
Step 7 - Abstracting out "--help" Action Logic
The obvious next step is to implement "--version"
. But if we
follow what was done for "--help"
that's going to pad out our
"main" (clasp_main()
) even more. Innately (or experientially, at
least) I have qualms about the anatomy of the program as it stands: all logic
is clumped together. So, let's first start making things a bit more
transparent by abstracting out the "--help"
implementation - the
action logic - into a worker function show_help()
(see Listing
7).
Listing 7
. . . static void show_help(FILE* stm); static int clasp_main(clasp_arguments_t const* args) { . . . if(clasp_flagIsSpecified(args, "--help")) { show_help(stdout); return EXIT_SUCCESS; } if(clasp_checkValue(args, 0, &inName, NULL, &arg)) . . . for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; } . . . static void show_help(FILE* stm) { clasp_showUsage( NULL , aliases . . . ); }
Step 8 - Implementing "--version" Flag
Given the foregoing two steps, the implementation of the
"--version"
flag is simple and obvious, as shown in Listing 8.
Take note that the major, minor, and revision version numbers - 0, 1, and X
(currently 8) - are specified in two places! This is a clear violation of
DRY SPOT
[PragProg,
IC++,
AoUP],
and it won't surprise you in the least to learn
that I actually fluffed it and got them out of step during the development.
We'll deal with this soon, after we deal with the problem that our slash
swapping action logic is drowning in a sea of main()
s.
Listing 8
. . . static clasp_alias_t aliases[] = { CLASP_FLAG(NULL, "--help", "invokes this help and terminates"), CLASP_FLAG(NULL, "--version", "displays version and terminates"), CLASP_ALIAS_ARRAY_TERMINATOR }; static void show_help(FILE* stm); static void show_version(FILE* stm); static int clasp_main(clasp_arguments_t const* args) { . . . if(clasp_flagIsSpecified(args, "--help")) { show_help(stdout); return EXIT_SUCCESS; } if(clasp_flagIsSpecified(args, "--version")) { show_version(stdout); return EXIT_SUCCESS; } if(clasp_checkValue(args, 0, &inName, NULL, &arg)) . . . for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; } . . . static void show_help(FILE* stm) . . . static void show_version(FILE* stm) { clasp_showVersion( NULL , "slsw" , 0 , 1 , 8 , clasp_showVersionByFILE , stm , 0 ); }
Step 9 - Abstracting out Slash-swapping Action Logic
You can really see the influence of revisibility in this one: I've abstracted
out the slash-swapping action logic into the slsw()
function by
providing a forward function declaration and then extract-as-function
refactored right where it sits, as shown in Listing 9. Now
clasp_main()
is almost entirely, "cleanly", composed of decision
logic: I think that's a major improvement, and sits well with our
understanding of its purpose in deciding what to do, and not worrying about
how that's done.
Listing 9
. . . static clasp_alias_t aliases[] = . . . static int slsw( FILE* in , FILE* out ); static void show_help(FILE* stm); static void show_version(FILE* stm); static int clasp_main(clasp_arguments_t const* args) { . . . if(clasp_checkValue(args, 1, &outName, NULL, &arg)) { . . . } return slsw( in , out ); } static int slsw( FILE* in , FILE* out ) { int ch; for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; } . . .
Note that every code change since Step 1 is code that can, and probably should, be generated by a wizard.
Step 10 - Windows-compatible Swapping
So far, we've spent most of our time looking at command-line handling, and
haven't taken a look at all at the slash-swapping action logic itself. One of
the first things that jumps out is that it is UNIX-specific: it assumes that
backslashes are "wrong" and forward slashes are "right". (To be sure, this is
true, but it's not the view of the entire computational world.) We can address
this within the newly separated slsw()
function, as shown in
Listing 10.
Listing 10
static int slsw( FILE* in , FILE* out ) { #if defined(_WIN32) # define SLSW_AMBIENT_CHAR_ '\\' # define SLSW_ALT_CHAR_ '/' #elif defined(UNIX) || \ defined(unix) # define SLSW_AMBIENT_CHAR_ '/' # define SLSW_ALT_CHAR_ '\\' #else # error Operating-system not discriminated #endif char const srch = SLSW_ALT_CHAR_; char const repl = SLSW_AMBIENT_CHAR_; int ch; for(; EOF != (ch = fgetc(in)); ) { if(srch == ch) { ch = repl; } fputc(ch, out); } return EXIT_SUCCESS; }
Step 11 - Implementing "--reverse" Flag
Having got to an implementation of slsw()
that works correctly on
both UNIX and Windows, it's now time to provide the more sophisticated
behaviour that is provided by the extant slsw tool: to be
able to "reverse" the ambient swapping (something that is very useful when
writing on one operating system about coding on another, as it happens). We'll
support this by adding support for a "--reverse" flag, as shown in Listing 11.
Listing 11
. . . static clasp_alias_t aliases[] = { CLASP_FLAG("-r", "--reverse", "reverses the swapping from non-ambient=>ambient to ambient=>non-ambient"), CLASP_FLAG(NULL, "--help", "invokes this help and terminates"), CLASP_FLAG(NULL, "--version", "displays version and terminates"), CLASP_ALIAS_ARRAY_TERMINATOR }; static int slsw( FILE* in , FILE* out , int reverse ); . . . static int clasp_main(clasp_arguments_t const* args) { . . . clasp_argument_t const* arg; int reverse = 0; if(clasp_flagIsSpecified(args, "--help")) . . . if(clasp_flagIsSpecified(args, "--version")) . . . reverse = clasp_flagIsSpecified(args, "--reverse"); if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0)) . . . if(clasp_checkValue(args, 1, &outName, NULL, &arg)) . . . return slsw( in , out , reverse ); } static int slsw( FILE* in , FILE* out , int reverse ) { #if defined(_WIN32) # define SLSW_AMBIENT_CHAR_ '\\' # define SLSW_ALT_CHAR_ '/' #elif defined(UNIX) || \ defined(unix) # define SLSW_AMBIENT_CHAR_ '/' # define SLSW_ALT_CHAR_ '\\' #else # error Operating-system not discriminated #endif char const srch = reverse ? SLSW_AMBIENT_CHAR_ : SLSW_ALT_CHAR_; char const repl = reverse ? SLSW_ALT_CHAR_ : SLSW_AMBIENT_CHAR_; int ch; for(; EOF != (ch = fgetc(in)); ) . . . return EXIT_SUCCESS; }
Step 12 - Added "--reverse" Option, and Sophisticated Behaviour
Of course, once you start to add sophistication, it's often tempting to add
more. We can readily imagine a future version of such a tool needing to
expand its abilities as illustrated by Step 12: in addition to the existing
'ambient' and 'reverse' modes, it now also supports 'backward' and 'forward'
slashes, and inverting of whatever is encountered, all via the new
"--mode"
option. Each mode has a flag alias, and the
"--reverse"
flag becomes a flag alias for
backwards-compatibility.
Listing 12
. . . static clasp_alias_t aliases[] = { CLASP_OPTION( "-m" , "--mode" , "specifies the mode for slash swapping. " "'ambient' changes non-ambient slashes to ambient slashes, and is the default if mode not specified; " "'back' changes slashes to backslashes; " "'forward' changes backslashes to slashes; " "'invert' inverts all slashes; " "'reverse' does the opposite of 'ambient'." , "|ambient|back|forward|invert|reverse" ), CLASP_OPTION_ALIAS("-a", "--mode=ambient"), CLASP_OPTION_ALIAS("-b", "--mode=back"), CLASP_OPTION_ALIAS("-f", "--mode=forward"), CLASP_OPTION_ALIAS("-i", "--mode=invert"), CLASP_OPTION_ALIAS("-r", "--mode=reverse"), #ifndef SLSW_NO_BACKWARDS_COMPATIBILITY CLASP_OPTION_ALIAS("--reverse", "--mode=reverse"), /* backwards compatibility */ #endif /* SLSW_NO_BACKWARDS_COMPATIBILITY */ CLASP_FLAG(NULL, "--help", "invokes this help and terminates"), CLASP_FLAG(NULL, "--version", "displays version and terminates"), CLASP_ALIAS_ARRAY_TERMINATOR }; /* detect operating system */ #if defined(_WIN32) # define SLSW_OS_IS_WINDOWS #elif defined(UNIX) || \ defined(unix) # define SLSW_OS_IS_UNIX #else # error Operating-system not discriminated #endif enum slsw_mode_t { /* pseudo-modes */ SLSW_MODE_AMBIENT = 0, SLSW_MODE_REVERSE, /* real modes */ SLSW_MODE_INVERT, #ifdef SLSW_OS_IS_UNIX SLSW_MODE_B2F = SLSW_MODE_AMBIENT, SLSW_MODE_F2B = SLSW_MODE_REVERSE, #endif #ifdef SLSW_OS_IS_WINDOWS SLSW_MODE_B2F = SLSW_MODE_REVERSE, SLSW_MODE_F2B = SLSW_MODE_AMBIENT, #endif SLSW_MAX_VALUE }; typedef enum slsw_mode_t slsw_mode_t; static int slsw( FILE* in , FILE* out , slsw_mode_t mode ); . . . static int clasp_main(clasp_arguments_t const* args) { . . . clasp_argument_t const* arg; slsw_mode_t mode = SLSW_MODE_AMBIENT; if(clasp_flagIsSpecified(args, "--help")) . . . if(clasp_flagIsSpecified(args, "--version")) . . . arg = clasp_findFlagOrOption(args, "--mode", 0); if(NULL != arg) { if(0 == strcmp(arg->value.ptr, "ambient")) { mode = SLSW_MODE_AMBIENT; } else if(0 == strcmp(arg->value.ptr, "back")) { mode = SLSW_MODE_F2B; } else if(0 == strcmp(arg->value.ptr, "forward")) { mode = SLSW_MODE_B2F; } else if(0 == strcmp(arg->value.ptr, "invert")) { mode = SLSW_MODE_INVERT; } else if(0 == strcmp(arg->value.ptr, "reverse")) { mode = SLSW_MODE_REVERSE; } else { fprintf(stderr, "slsw: invalid mode specified\n"); return EXIT_FAILURE; } } if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0)) . . . if(clasp_checkValue(args, 1, &outName, NULL, &arg)) . . . return slsw( in , out , mode ); } static int slsw( FILE* in , FILE* out , slsw_mode_t mode ) { #ifdef SLSW_OS_IS_WINDOWS # define SLSW_AMBIENT_CHAR_ '\\' # define SLSW_ALT_CHAR_ '/' #endif #ifdef SLSW_OS_IS_UNIX # define SLSW_AMBIENT_CHAR_ '/' # define SLSW_ALT_CHAR_ '\\' #endif int ch; for(; EOF != (ch = fgetc(in)); ) { switch(ch) { case SLSW_AMBIENT_CHAR_: if(SLSW_MODE_AMBIENT != mode) { ch = SLSW_ALT_CHAR_; } break; case SLSW_ALT_CHAR_: if(SLSW_MODE_REVERSE != mode) { ch = SLSW_AMBIENT_CHAR_; } break; } fputc(ch, out); } return EXIT_SUCCESS; } . . .
I leave as an exercise for the reader an examination of the new action
logic - I do rather like the clever interplay 'twixt enumerator values and
switch
, but I'm probably kidding myself - and instead point out
how the revisibility is pretty good for such a large change. In and of itself
it doesn't make the code good, but it does help to follow what's happening,
which we might presume is an indirect aid to software quality.
Step 13 - Added Precondition Enforcements to slsw
Having separated the action logic into a separate function, it behoves us to
enforce precondition enforcements. The precondition is simple: neither
in
nor out
can be NULL
. It is enforced
by the standard function-like macro assert()
, introduced by
<assert.h>
. For brevity, no listing is shown of the
changes.
Step 14 - Handling DRY SPOT Violations
Now to tackle another of the issues that are important to program anatomy: DRY SPOT violations [PragProg, IC++, AoUP]! Specifically, there are four outright violations, and one somewhat subtle one. The outright violations are the multiple uses of literals - "slsw", 0, 1, and 15 (now 16) - for specifying program name and version numbers. The subtle one is the widespread further use of the string "slsw" within various longer literal strings (used for contingent reports). If we choose to change the program name in the future, we'd better hope to be using a good search-replace tool. Better to DRY it now, and have a SPOT.
This was easy to achieve in this case (see Listing 13) via the four
object-like macros PROGRAM_NAME
, PROGRAM_VER_MAJOR
,
PROGRAM_VER_MINOR
, and PROGRAM_VER_REVISION
. That
ease is, in part, due to the simplicity of slsw: it is written in C; it is a
standalone tool; it does not (yet) use diagnostic logging; the version/usage
information is statically determined.
Listing 13
. . . #include <string.h> #define PROGRAM_NAME "slsw" #define PROGRAM_VER_MAJOR 0 #define PROGRAM_VER_MINOR 1 #define PROGRAM_VER_REVISION 16 static clasp_alias_t aliases[] = . . . . . . static int clasp_main(clasp_arguments_t const* args) { . . . arg = clasp_findFlagOrOption(args, "--mode", 0); if(NULL != arg) { if(0 == strcmp(arg->value.ptr, "ambient")) { mode = SLSW_MODE_AMBIENT; } . . . else { fprintf( stderr , "%s: invalid mode specified\n" , PROGRAM_NAME ); return EXIT_FAILURE; } } if(0 != clasp_reportUnusedFlagsAndOptions(args, &arg, 0)) { fprintf( stderr , "%s: unrecognised argument: %s\n" , PROGRAM_NAME , arg->givenName.ptr ); return EXIT_FAILURE; } if(clasp_checkValue(args, 0, &inName, NULL, &arg)) { if(0 == arg->givenName.len) { if(NULL == (in = fopen(inName, "r"))) { int const e = errno; fprintf( stderr , "%s: could not open '%s' for read access: %s\n" , PROGRAM_NAME , inName , strerror(e) ); return EXIT_FAILURE; } } } if(clasp_checkValue(args, 1, &outName, NULL, &arg)) { if(0 == arg->givenName.len) { if(NULL == (out = fopen(outName, "w"))) { int const e = errno; fprintf( stderr , "%s: could not open '%s' for write access: %s\n" , PROGRAM_NAME , outName , strerror(e) ); return EXIT_FAILURE; } } } . . . } static int slsw( FILE* in , FILE* out , slsw_mode_t mode ) . . . int main(int argc, char** argv) { int const clflags = 0 | CLASP_F_TREAT_SINGLEHYPHEN_AS_VALUE ; return clasp_main_invoke(argc, argv, clasp_main, PROGRAM_NAME, aliases, clflags, NULL); } static void show_help(FILE* stm) { clasp_showUsage( NULL , aliases , PROGRAM_NAME , "Synesis Software SystemTools (http://synesis.com.au/systools)" , NULL , "Swaps slashes in text" , PROGRAM_NAME " [ ... options ... ] [<input-file>|-] [<output-file>|-]" , PROGRAM_VER_MAJOR , PROGRAM_VER_MINOR , PROGRAM_VER_REVISION , clasp_showHeaderByFILE , clasp_showBodyByFILE , stm , 0 , 76 , -4 , 1 ); } static void show_version(FILE* stm) { clasp_showVersion( NULL , PROGRAM_NAME , PROGRAM_VER_MAJOR , PROGRAM_VER_MINOR , PROGRAM_VER_REVISION , clasp_showVersionByFILE , stm , 0 ); }
Step 15 - Splitting into library & main: "Program Design is Library Design"
Over the years, I've misremembered a Bjarne Stroustrup quote of longstanding. With the assistance of the good folks on ACCU General, I've now ascertained that the original quote is "language design is library design" (and there's also one that says "library design is language design", for good measure), which I realise now doesn't really capture what I want to say here.
Instead, I'm starting my own quote about a form of good practice in program design: program design is library design. You all have my express permission to propagate this to the end of time (with due attribution ;-).
Let's now split up the code we've arrived at thus far along the lines of decision vs action logic, giving three files: slsw.h, slsw.c, and main.c.
slsw.h contains the following:
-
a
#include
for stdio.h (becauseslsw()
references theFILE
type); -
operating system discrimination (because
slsw_mode_t
requires it); -
definition of the
slsw_mode_t
enumeration; and -
declaration of the
slsw()
function.
slsw.c contains the following:
-
required
#include
s, starting with"slsw.h"
; and -
implementation of the
slsw()
function.
main.c contains the following (as it did previously):
-
required
#include
s:"slsw.h"
; then CLASP headers; then standard headers; and; -
SPOTs for
PROGRAM_NAME
, etc; -
aliases
array; -
forward declarations for
show_help()
andshow_version()
; -
clasp_main()
; -
main()
; and -
implementations of
show_help()
andshow_version()
.
I hope it's now clear that the slsw.h (declarations) and slsw.c (implementation) together form a library, which can be used independently of any notion of CLI (or any other particular) execution context. As well as being used within the slsw program, the library may be reused by other programs (e.g. slswgui), and, of particular importance for software quality, in automated test harnesses.
Step 16 - Fixing up Coupling and Semantics of slsw()
Now we've abstracted slsw()
into a separate file, its coupling
- both physical and semantic - to the command-line is evident: it relies on
stdlib.h
and its return value is EXIT_SUCCESS
(or,
by implication, EXIT_FAILURE
). This is wrong.
We can fix this very easily, simply by changing it to return 0 for success,
and non-0 for failure, relying on errno
(as set by
fgetc()
or fputc()
) for more detailed failure
information, as shown in Listings 14-16. As you may know, gentle readers, the
standard requires that a program return value of EXIT_SUCCESS
is
treated as equivalent to 0, and that EXIT_FAILURE
is not 0. So
we've cunningly done nothing to reduce backwards-compatibility while reducing
coupling. Which is nice.
Listing 14
. . . typedef enum slsw_mode_t slsw_mode_t; /** Swaps slashes in \c in to \c out, according to \c mode * * \retval 0 The function succeeded * \retval !0 The function failed. errno will indicate reason * * \pre (NULL != in) * \pre (NULL != out) */ int slsw( FILE* in , FILE* out , slsw_mode_t mode );
Listing 15
int slsw( FILE* in , FILE* out , slsw_mode_t mode ) { #ifdef SLSW_OS_IS_WINDOWS # define SLSW_AMBIENT_CHAR_ '\\' # define SLSW_ALT_CHAR_ '/' #endif #ifdef SLSW_OS_IS_UNIX # define SLSW_AMBIENT_CHAR_ '/' # define SLSW_ALT_CHAR_ '\\' #endif int ch; assert(NULL != in); assert(NULL != out); for(; EOF != (ch = fgetc(in)); ) { switch(ch) { case SLSW_AMBIENT_CHAR_: if(SLSW_MODE_AMBIENT != mode) { ch = SLSW_ALT_CHAR_; } break; case SLSW_ALT_CHAR_: if(SLSW_MODE_REVERSE != mode) { ch = SLSW_AMBIENT_CHAR_; } break; } if(ch != fputc(ch, out)) { return -1; } } if(ferror(in)) { return -1; } return 0; }
Listing 16
static int clasp_main(clasp_arguments_t const* args) { . . . if(clasp_checkValue(args, 0, &inName, NULL, &arg)) . . . if(clasp_checkValue(args, 1, &outName, NULL, &arg)) . . . if(0 == slsw( in , out , mode )) { return EXIT_SUCCESS; } else { int const e = errno; fprintf( stderr , "%s: failed to complete slash-swapping: %s\n" , PROGRAM_NAME , strerror(e) ); return EXIT_FAILURE; } }
Note that slsw()
is simple enough that we don't have to do
diagnostic logging and contingent reporting here (though even in this we lose
the knowledge of whether it's input (fgetc()
) or output
(fputc()
) that fails. More complex action-logic components may
have to use more complex failure reporting to their decision-logic callers,
including process/thread-global error state variables (a la
errno
), return codes, exceptions, callbacks, diagnostic logging
and contingent reports.
Summary
This article has examined the incremental development of a simple but real program written in C as a basis for analysis of some of the issues pertaining to CLI program anatomy. In particular, it has discussed the delineation of program implementation into decision logic, action logic, and support logic, and demonstrated how separation of the code on such lines brings several benefits: separation of the action logic into a library increases clarity, scope for reuse, testability, and modularity. This principle of program design is library design will be a constant feature of the series.
As a by-product of this exercise, the article has also provided a simple
example of function layering: simplifying a large and complex
main()
by abstracting out the boilerplate support logic in the
form of a function to which we pass the address of a smaller, specific "main".
Subsequent articles will consider how other services can be initialised in a
similar manner, enabling access to sophisticated (albeit uninteresting)
functionality with minimal intrusion into the code, preferably in a way that
can be wizard-generated.
Finally, the article described the identification and elimination of sources
of repetition in the program name and version numbers. In the simple case
presented, these "identity attributes" were defined as pre-processor
object-like macros. Subsequent articles will consider alternatives, reflecting
requirements of language and good practice as well as considering how such
attributes may be obtained dynamically (such as from a program's Windows
version resource), and how (and when) they must be defined to interact
reliably with the phases of "main()
"s and various support
services.
Next
In the next article, I will turn to the subject of CLI programs written in C++, and consider the advantages and disadvantages as compared to C: by then, all being well, I will have kept my writing momentum up and completed the next Quality Matters instalment - the third in the series on C++ exceptions - for the next issue of Overload and will be able to draw on that also, and so keep down the length.
Further issues of interest to be covered in the next article will include some/all of the following:
- Character encodings - multibyte and/or widestring;
- Removable Diagnostic Measures - how to facilitate high quality software without undue coupling;
- Names - for identity attributes, for namespaces, for files, for project-related directories;
- Directories - where to place the decision logic, action logic, the support logic, and the project files;
- Testing - how much can be auto-generated by the wizard; and
- Function Layering to the Max!
Finally, before the next article in the series I intend to complete the first wizard rewrite, encapsulating all the issues discussed herein, and hope to be able to report back on being able to generate sophisticated, modular, program projects according to the principles and techniques presented thus far. We might even have some downloadable goodies!
Acknowledgements
Many thanks go to Chris Oldwood and Garth Lancaster for helping me despite what has become typically eleventh-hour preparation of the draft. Usual thanks/apologies go to Steve Love. I'd promise to write the next article in plenty of time, but he knows I'd find some reason to break it. Ah well.
Author Bio
Matthew is a software development consultant and trainer for Synesis Software who helps clients to build high-performance software that does not break, and an author of articles and books that attempt to do the same. He can be contacted at matthew@synesis.com.au.
References
[AoUP] The Art of UNIX Programming, Eric S. Raymond, Addison-Wesley, 2003
[CLASP] CLASP is an open-source library for Command-Line Argument Sorting and Parsing, available via the Subversion respository at http://sourceforge.net/projects/systemtools
[CLASP-2011] An Introduction to CLASP, Matthew Wilson, CVu, volume 23 number 6, January 2012
[EAM] http://c2.com/cgi/wiki?ExecuteAroundMethod
[FEATHERS] Working Effectively with Legacy Code, Michael Feathers, Pearson, 2004
[IC++] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004
[PragProg] The Pragmatic Programmer, Andy Hunt and Dave Thomas, Addison-Wesley, 1999.
[PG2L] A Practical Guide to Linux, Mark G. Sobell, Prentice Hall, 2005
[QM-5] Quality Matters, Part 5: Exceptions: The Worst Form of Error Handling, Apart From All The Others, Matthew Wilson, Overload 98, August 2010