cpif
====
Jordi Fita <jfita@geishastudios.com>
:keywords: literate programming, command line, tool, Vala
:comments:

:1: http://live.gnome.org/Vala

'cpif' is a simple command line tool written in {1}[Vala] that overwrites a
file with the contents of the standard input if and only if the output file and
the data read differ.

:2: http://en.wikipedia.org/wiki/Make

This tool is very convenient to build literate programming applications with
automation software like {2}[make].  As make uses the file's modification time
to know whether to rebuild a target, if a literate program has multiple source
code files to tangle within the same documentation, using 'cpif' I can avoid
rebuilding these files whose content hasn't changed and prevent long
recompilation times.

[role="downloadbar"]
****
image:images/down_win.png["Windows Download",link="http://www.geishastudios.com/download/cpif-win32.zip"]
image:images/down_source.png["Source Code Download",link="http://www.geishastudios.com/download/cpif.tar.bz2"]
****


//{{{1
The Program
-----------

cpif must retrieve the output file to overwrite from the command line
parameters, read everything that it can from the standard input and store the
data read to a temporary file, so to not use too much memory, and then compare
this temporary file and the destination file, if it exists.  If the destination
and the temporary files have the same content, cpif deletes the temporary file.
Otherwise, overwrites the output file with the temporary.

[source, vala]
----
<<main function>>=
int main(string[] args)
{
    try {
        <<get the destination file name from the parameters>>
        <<copy stdin to a temporary file>>
        if (files_are_equal(tempFileName, destFileName)) {
            <<remove temporary file>>
        } else {
            <<overwrite destination file with temporary file>>
        }
        return 0;
    } catch (FileError e) {
        stderr.printf("Error: %s", e.message);
    }
    return -1;
}
----

Thus, cpif must check that the user passed the destination file as a command
line parameter.  Without this parameter, cpif must show its usage and abort the
program execution.  If there are more than one parameters, these are ignored,
as I only store the first parameter as the output file name.

[source, vala]
----
<<get the destination file name from the parameters>>=
if (args.length < 2) {
    stderr.printf("Usage: %s destination\n", args[0]);
    return -1;
}
string destFileName = args[1];
----

Besides the destination file, I also need to create a temporary file with the
contents from the standard input to compare to the destination file.

[source, vala]
----
<<copy stdin to a temporary file>>=
string tempFileName = copy_to_temp_file(stdin);
----

The function that copies the content from the standard input to a temporary
file actually doesn't know from where it is copying the data from.  This
function expects a `FileStream` parameter that uses to read the data and copy
to a new temporary file.  After the copy is done, this function returns the
name of the file where it copied the data to.

[source, vala]
----
<<function to create the temporary file>>=
string copy_to_temp_file(GLib.FileStream input) throws FileError
{
    <<create temporary file>>
    <<open temporary file>>
    <<copy input to temporary file>>

    return tempFileName;
}
----

To create the temporary file, Vala offers various functions.  The most obvious
function to use in this case is `open_tmp` which given a file name template,
creates and opens a file.  The only problem I have with this function is that
it creates the file to the system's temporal folder.  This is not a problem per
se, but in most installations the temporal system folder and the home folder
are in different file systems, which means that to overwrite the destination
file I must 'copy' the contents from one file to the other instead of just
renaming the temporary file.  I want to avoid copying to much if possible.

Given that in most cases the destination file is at in the working directory,
what I do instead if creating the temporary file there using `mkstemp`.  If
this function fails, then I fall back to use `open_tmp`.  Fortunately, I end up
with the same result with both functions: a file name and an open file
description.  The only 'gotcha' is that `mkstemp` *modifies* the input template
and thus I can't pass an string literal and I must use a `string` variable.
This is also the reason I have to pass again the same template to `open_tmp` if
I couldn't use `mkstemp`.

Additionally, I don't check the output of `open_tmp` because if this function
fails it throws a `FileError` exception which is exactly what I want.

[source, vala]
----
<<create temporary file>>=
string tempFileName = "cpif-XXXXXX";
int file_descriptor = GLib.FileUtils.mkstemp(tempFileName);
if (file_descriptor < 0) {
    file_descriptor = GLib.FileUtils.open_tmp("cpif-XXXXXX", out tempFileName);
}
----

The temporary file is now already open, but I only have a file descriptor.  To
use this descriptor with Vala, I have to 'wrap' this file with a `FileStream`
object.  As I don't know whether the input is text or binary, I'll assume that
everything is binary data.

[source, vala]
----
<<open temporary file>>=
GLib.FileStream output = GLib.FileStream.fdopen(file_descriptor, "wb");
if (output == null) {
    throw new FileError.FAILED("Couldn't open temporary file");
}
----

Now it is only a matter to copy the input data to this output file.  I also
explicitly close the output stream (i.e., I set it to `null`), but this is due
a bug from a previous version of Vala that didn't close files when the variable
goes out of scope.

[source, vala]
----
<<copy input to temporary file>>=
copy_stream(input, output);
output = null;
----

The function that copies from one stream to another is in a separate function
because I also need to do this under certain conditions when copying the
temporary over the destination file.

In this copy function I can't use `FileStream` member function `read_line` to
read from the input because I assume everything is binary and thus could have
end of string characters (`\0`).  What I need to do is create an array and read
the input chunk by chunk and write each chunk to the output.

One thing I need to be very careful is with the parameters to `FileStream`
`read` and `write` member functions.  According to the documentation, the first
parameter is the data array and the second the size (i.e., the number of bytes
to read or to write from that array.)  Vala, however,  doesn't work this way at
all.  The second parameter is the *number of times* the *whole* array is going
to be read or written to.  Hence, I want this parameter to be `1` -- the
default value -- almost always.

The problem is that when writing the output, there are times that I don't want
to use the whole array.  This is the case when there is not enough data from
the input to fill up the array.  Unable to specify the number of elements from
the array to use to write, I am being forced to 'slice' the array from the
first element up to the last read character.

Again, due to Vala's compiler errors, this slice must be performed in a
sentence by itself or the write doesn't work at all.  Even though the
documentation states that slicing creates a new array, the compiler is smart
enough to use pointers instead.  I guess Vala uses a copy-on-write policy for
slices.

[source, vala]
----
<<function to copy a file over another>>=
void copy_stream(GLib.FileStream input, GLib.FileStream output)
{
    uint8[] input_buffer = new uint8[32768];
    size_t bytes_read = 0;
    while (!input.eof() && (bytes_read = input.read(input_buffer)) > 0) {
        uint8[] output_buffer = input_buffer[0:bytes_read];
        output.write(output_buffer);
    }
}
----

With the standard input's data safely stored in a temporary file, it is now
time to check if that file and the destination file are equal.  Again, I have
to assume that both files contain binary data.

[source, vala]
----
<<function to compare whether to files are equal>>=
bool files_are_equal(string sourceFileName, string destFileName) throws FileError
{
    <<open destination file>>
    <<open source file>>
    <<compare source and destination files>>
}
----

Both files are equal if every byte that I read from one of them is the same
byte I read from the other and I read the end of file at the same time.

[source, vala]
----
<<compare source and destination files>>=
int source = 0;
int dest = 0;
do {
    source = sourceFile.getc();
    dest = destFile.getc();
} while (source == dest && source != GLib.FileStream.EOF);

return source == dest;
----

If the destination file does not exist, then, obviously, the two files are
different.

[source, vala]
----
<<open destination file>>=
GLib.FileStream destFile = GLib.FileStream.open(destFileName, "rb");
if (destFile == null) {
    return false;
}
----

However, if the source file -- the temporary file -- does not exist, then this
is an error that I need to report.

[source, vala]
----
<<open source file>>=
GLib.FileStream sourceFile = GLib.FileStream.open(sourceFileName, "rb");
if (sourceFile == null) {
    throw new FileError.NOENT("Couldn't open source file to compare");
}
----

The last thing to do is delete the temporary file if it is equal to the
destination.

[source, vala]
----
<<remove temporary file>>=
GLib.FileUtils.remove(tempFileName);
----

Otherwise, overwrite the destination file with the contents of the temporary file.

[source, vala]
----
<<overwrite destination file with temporary file>>=
move_file(tempFileName, destFileName);
----

[source, vala]
----
<<function to move a file>>=
void move_file(string sourceFileName, string destFileName) throws FileError
{
    <<try to move source over destination>>
    if (rename_result < 0) {
        <<copy source to destination>>
        <<remove source file>>
    }
}
----

To move, I first try to rename the source file like the destination file.  This
is a very lightweight file system operation and is the preferred method to
overwrite the destination file.

[source, vala]
----
<<try to move source over destination>>=
int rename_result = GLib.FileUtils.rename(sourceFileName, destFileName);
----

If the rename operation that probably means that the source and destination
files are in different file systems.  In that case, what I need to do is open
both files and copy the contents from the source to the destination file.  Here
I use again the `copy_stream` function defined earlier.

[source, vala]
----
<<copy source to destination>>=
GLib.FileStream sourceFile = GLib.FileStream.open(sourceFileName, "rb");
if (sourceFile == null) {
    throw new FileError.FAILED("Couldn't open file `%s'".printf(sourceFileName));
}

GLib.FileStream destFile = GLib.FileStream.open(destFileName, "wb");
if (destFile == null) {
    throw new FileError.FAILED("Couldn't open file `%s'".printf(destFileName));
}

copy_stream(sourceFile, destFile);

destFile = null;
sourceFile = null;
----

Once all the data is copied, I should delete the source file.  This isn't
actually required, but I don't like to leave behind temporary files.

[source, vala]
----
<<remove source file>>=
GLib.FileUtils.remove(sourceFileName);
----

All the previously defined fragments could be written in a single source code
as this:

[source, vala]
----
<<*>>=
/*
   cpif -- Copy from stdin to a file if the contents are different.
   <<license>>
 */
<<function to copy a file over another>>

<<function to create the temporary file>>

<<function to compare whether to files are equal>>

<<function to move a file>>

<<main function>>
----

//{{{1
Building
--------

A small Makefile should be enough to build and link `cpif` from the source
document.

:3: http://www.methods.co.nz/asciidoc/
:4: http://www.geishastudios.com/literate/atangle.html

The first thing that I needs to do is to extract the Vala source code from the
{3}[AsciiDoc] document using {4}[atangle].

[source, make]
----
<<extract vala source code>>=
cpif.vala: cpif.txt
	atangle $< > $@
----

Although it is possible to compile and link the `cpif` executable directly from
the Vala source code using the Vala compiler -- `valac` -- I prefer to avoid
the dependency of this compiler when distributing the source code to third
parties and thus I generate the C source code from the Vala input.

[source, make]
----
<<generate C source code>>=
cpif.c: cpif.vala
	valac -C -o $@ $<
----

:5: http://www.mingw.org/

This generated source code is regular C code and thus can be build with the
platform's compiler.  However, I need to determine that platform's executable
suffix (i.e., '.exe' for Microsoft® Windows®) if I want to have the correct
Makefile rules.  To detect the system on which the executable is being build, I
use the `uname -s` command available in both GNU/Linux and in {5}[MinGW].

[source, make]
----
<<determine executable suffix>>=
UNAME = $(shell uname -s)
----

Then, I only need to check whether if I can find the 'MINGW' string in the
output of `uname`.  If `findstring` call's result is the empty string, then I
assume that I am building in a platform without executable suffix.  This works
for most Unix environments (e.g., GNU/Linux and Mac OS® X.)

[source, make]
----
<<determine executable suffix>>=
MINGW = $(findstring MINGW, $(UNAME))
ifneq ($(MINGW),)
EXE := .exe
endif
----

With the correct suffix detected, I can now add the correct rule to build the
final `cpif` executable from the C source code.  Being written in Vala, the
executable must be linked to `gobject-2.0` as well.

[source, make]
----
<<build cpif executable>>=
cpif$(EXE): cpif.c
	gcc -o $@ $< `pkg-config --cflags --libs gobject-2.0`
----

It is sometime useful to have rules that removes the executable and all the
build artifacts (i.e., the C source code.)  Traditionally this rule is named
`clean` and removes all the files that the Makefile itself made.  I mark this
target `PHONY` in case there is a file also named `clean` in the source
directory, in which case make would ignore the rule.

[source, make]
----
<<clean build artifacts>>=
.PHONY: clean

clean:
	rm -fr cpif$(EXE) cpif.vala cpif.c
----

Now I have all the required target rule for cpif.  As the first rule is the one
that make builds by default, I have the executable rule first and then, by
order, their dependences until the original document.  After all the
dependences, I have the `clean` target.  The Makefile's structure, then, is the
following:

----
<<Makefile>>=
<<determine executable suffix>>
<<build cpif executable>>

<<generate C source code>>

<<extract vala source code>>

<<clean build artifacts>>
----


//{{{1
License
-------

This program is distributed under the terms of the GNU General Public License
(GPL) version 2.0 as follows:

----
<<license>>=
Copyright (c) 2011 Jordi Fita <jfita@geishastudios.com>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2.0 as
published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307  USA
----
