This section details the implementation of an example ``naive''
encoding scheme and compares it to the SOLE encoding scheme. This
naive approach is simple, and while fast it sacrifices a lot of space
in the encoding.

Our hypothesis is that even though the computation step of an encoding may be
extremely fast, it is comparatively insignificant to the IO-bound operations, in
which wasting space may be a determining factor in overall speed. This is
discussed further in \Fref{sec:point-comp}.

All tests were performed on an Intel Core i5-2400 which has four cores with a
base frequency of 3.10GHz. The machine also had 8Gb RAM.

\subsection{A naive encoding scheme}
A straight-forward encoding scheme simply takes every bit in an input stream and
copies it to its output destination followed by a 1-bit, if that was the end of
the stream; otherwise a zero is output. This scheme is extremely simple and can
be implemented somewhat efficiently.

Under the assumption of 64-bit registers and arithmetic operations, the scheme
is implemented in the following manner:
\begin{itemize}[noitemsep]
\item Grab the next 32 bits of data, call it $x$
\item Obtain a 64-bit number $z = \texttt{zmort}(x)$
\item If $x$ were the last bits in the input stream, flip the second-most
  significant bit in the last byte of $z$.
\end{itemize}
The \texttt{zmort()} function intersperses the bits of its operand with zeroes,
effectively computing a Morton number, with one operand being zero. In our
implementation this means all odd bits are zero, except for at the end of the
input. This transformation can be computed in 15 operations per four words, so
it is very fast (see the file \texttt{naive/naive.c} for details).

\subsection{Hypothesis}
The SOLE encoding scheme should in theory run in the same time as it takes to
copy a file modulo computational overhead, since the increase in size is a
constant number of blocks (two or three). On the other hand, the naive encoding
scheme should be expected to run roughly in double the time it takes to copy the
given file, since there is very little processing involved. Our expectations are
that the IO-bound overhead of writing a much larger file outweighs the CPU-bound
overhead involved in SOLE encoding (when compared to the naive method).

Another (perhaps understated) hypothesis is that implementing arithmetic
operations by hand for specific purposes perform better than using generalised
multi-precision libraries such as GMP. To address this, the encoding
functionality of SOLE were also implemented using GMP (see
\texttt{src/sole-gmp.c}) and included in the benchmarks.

\subsection{Points of comparison}\label{sec:point-comp}
The benchmarks were carried out accordingly: A file of a given size is created,
pulling random bits from \texttt{/dev/urandom}. Then the program
\texttt{encode\_file} is invoked given this new file as input and some output
destination filename. \texttt{encode\_file} is implemented in such a way as to
time a function call (the encoding function) and output the resulting time
span. The timing is performed as follows (see \texttt{examples/encode\_file.c}
for more details):

\begin{ccode}
void time_call(int (*f)(FILE *source, FILE *dest),
               FILE *source,
               FILE *dest)
{
	double time_spent;
	clock_t begin, end;

	begin = clock();
	f(source, dest);
	end = clock();

        time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
	printf("%f\n", time_spent);
}
\end{ccode}
\Fref{tab:avgs} shows the results of running the above scheme for increasing
file sizes (in megabytes) from 1 to 1048586 (1 Gb). The numbers are averages
over a hundred runs for each file size, for every run generating a new random
file. We also report the standard deviation for each size.

\begin{table}[h]\centering
\begin{tabular}{|l|r@{.}l|r@{.}l|r@{.}l|r@{.}l|r@{.}l|r@{.}l|}
\hline
Size & \multicolumn{4}{c|}{SOLE}                            & \multicolumn{4}{c|}{Naive}                           & \multicolumn{4}{c|}{SOLE w/GMP}                      \\ \cline{2-13}
(MiB)                       & \multicolumn{2}{c|}{$\mu$} & \multicolumn{2}{c|}{$\sigma$} & \multicolumn{2}{c|}{$\mu$} & \multicolumn{2}{c|}{$\sigma$} & \multicolumn{2}{c|}{$\mu$} & \multicolumn{2}{c|}{$\sigma$} \\ \hline
1                           & 0        & 000055       & 0      & 000008     & 0        & 000019        & 0      & 000004     & 0        & 000068       & 0      & 000010     \\ \hline
2                           & 0        & 000073       & 0      & 000004     & 0        & 000016       & 0      & 000001     & 0        & 000086        & 0      & 000011       \\ \hline
512                         & 0        & 009905       & 0      & 000775     & 0        & 001333       & 0      & 000121     & 0        & 008824       & 0      & 001351        \\ \hline
1024                        & 0        & 019467       & 0      & 001687     & 0        & 002725       & 0      & 000382     & 0        & 016637        & 0      & 001162        \\ \hline
2048                        & 0        & 040434       & 0      & 006605     & 0        & 005488       & 0      & 000957     & 0        & 033939        & 0      & 004220        \\ \hline
8192                        & 0        & 156327       & 0      & 010047     & 0        & 022010        & 0      & 003092     & 0        & 134952         & 0      & 011984         \\ \hline
32768                       & 0        & 618616       & 0      & 010589     & 0        & 090904         & 0      & 012894     & 0        & 531491         & 0      & 019428         \\ \hline
65536                       & 1        & 24506        & 0      & 021578     & 0        & 17601          & 0      & 011479     & 1        & 05494          & 0      & 017803         \\ \hline
131072                      & 2        & 47669        & 0      & 027149     & 0        & 346551         & 0      & 013153     & 2        & 11318          & 0      & 031469         \\ \hline
262144                      & 4        & 93434        & 0      & 035537     & 0        & 684954         & 0      & 019765     & 4        & 22327          & 0      & 072468         \\ \hline
524288                      & 9        & 91757        & 0      & 117467     & 1        & 3558           & 0      & 013844     & 8        & 13143          & 0      & 15325           \\ \hline
1048586                     & 19       & 6984         & 0      & 128769     & 2        & 71217          & 0      & 042001     & 16       & 3364           & 0      & 46059           \\ \hline
\end{tabular}
\caption{Table of time averages (in seconds) and standard deviations for
  benchmarking runs.}
\label{tab:avgs}
\end{table}

As is evident from inspecting the numbers, the naive encoding, although it had
to output a much larger file, worked significantly faster than both
implementations of SOLE. Between the SOLE implementations it is clear that the
version implemented using GMP performed slightly better than the original
zero-dependencies version. This goes to show that implementing arithmetic
operations, even for fairly specific purposes requires a substantial effort to
outdo multi-precision libraries such as GMP.

It is also interesting to inspect the standard deviations as the size of the
input files grow. For both versions of SOLE this number seems to grow steadily,
indicating a higher spread in processing time for different files of the same
size. This could be due to the randomness, as some files would simply require
more processing, but we cannot conclude this with certainty.

\subsection{Conclusion}
We have discussed the details of the SOLE encoding scheme, but a lot remains to
be done. Decoding requires some division operations which proved difficult to
implement with the design of the API, so it is only implemented in the Haskell
version.

Based on the data in \fref{tab:avgs}, it is clear that a more interesting
treatment of this topic would have been to implement the algorithm using GMP, as
this would have allowed a treatment of decoding and more interesting test
cases. A lot of time was spent on implementing arithmetic operations on the
custom data types (in particular division).