This section details the implementation of an example ``naive'' encoding scheme and compares it to the SOLE encoding scheme. This naive approach is simple, and while fast it sacrifices a lot of space in the encoding. Our hypothesis is that even though the computation step of an encoding may be extremely fast, it is comparatively insignificant to the IO-bound operations, in which wasting space may be a determining factor in overall speed. This is discussed further in \Fref{sec:point-comp}. All tests were performed on an Intel Core i5-2400 which has four cores with a base frequency of 3.10GHz. The machine also had 8Gb RAM. \subsection{A naive encoding scheme} A straight-forward encoding scheme simply takes every bit in an input stream and copies it to its output destination followed by a 1-bit, if that was the end of the stream; otherwise a zero is output. This scheme is extremely simple and can be implemented somewhat efficiently. Under the assumption of 64-bit registers and arithmetic operations, the scheme is implemented in the following manner: \begin{itemize}[noitemsep] \item Grab the next 32 bits of data, call it $x$ \item Obtain a 64-bit number $z = \texttt{zmort}(x)$ \item If $x$ were the last bits in the input stream, flip the second-most significant bit in the last byte of $z$. \end{itemize} The \texttt{zmort()} function intersperses the bits of its operand with zeroes, effectively computing a Morton number, with one operand being zero. In our implementation this means all odd bits are zero, except for at the end of the input. This transformation can be computed in 15 operations per four words, so it is very fast (see the file \texttt{naive/naive.c} for details). \subsection{Hypothesis} The SOLE encoding scheme should in theory run in the same time as it takes to copy a file modulo computational overhead, since the increase in size is a constant number of blocks (two or three). On the other hand, the naive encoding scheme should be expected to run roughly in double the time it takes to copy the given file, since there is very little processing involved. Our expectations are that the IO-bound overhead of writing a much larger file outweighs the CPU-bound overhead involved in SOLE encoding (when compared to the naive method). Another (perhaps understated) hypothesis is that implementing arithmetic operations by hand for specific purposes perform better than using generalised multi-precision libraries such as GMP. To address this, the encoding functionality of SOLE were also implemented using GMP (see \texttt{src/sole-gmp.c}) and included in the benchmarks. \subsection{Points of comparison}\label{sec:point-comp} The benchmarks were carried out accordingly: A file of a given size is created, pulling random bits from \texttt{/dev/urandom}. Then the program \texttt{encode\_file} is invoked given this new file as input and some output destination filename. \texttt{encode\_file} is implemented in such a way as to time a function call (the encoding function) and output the resulting time span. The timing is performed as follows (see \texttt{examples/encode\_file.c} for more details): \begin{ccode} void time_call(int (*f)(FILE *source, FILE *dest), FILE *source, FILE *dest) { double time_spent; clock_t begin, end; begin = clock(); f(source, dest); end = clock(); time_spent = (double)(end - begin) / CLOCKS_PER_SEC; printf("%f\n", time_spent); } \end{ccode} \Fref{tab:avgs} shows the results of running the above scheme for increasing file sizes (in megabytes) from 1 to 1048586 (1 Gb). The numbers are averages over a hundred runs for each file size, for every run generating a new random file. We also report the standard deviation for each size. \begin{table}[h]\centering \begin{tabular}{|l|r@{.}l|r@{.}l|r@{.}l|r@{.}l|r@{.}l|r@{.}l|} \hline Size & \multicolumn{4}{c|}{SOLE} & \multicolumn{4}{c|}{Naive} & \multicolumn{4}{c|}{SOLE w/GMP} \\ \cline{2-13} (MiB) & \multicolumn{2}{c|}{$\mu$} & \multicolumn{2}{c|}{$\sigma$} & \multicolumn{2}{c|}{$\mu$} & \multicolumn{2}{c|}{$\sigma$} & \multicolumn{2}{c|}{$\mu$} & \multicolumn{2}{c|}{$\sigma$} \\ \hline 1 & 0 & 000055 & 0 & 000008 & 0 & 000019 & 0 & 000004 & 0 & 000068 & 0 & 000010 \\ \hline 2 & 0 & 000073 & 0 & 000004 & 0 & 000016 & 0 & 000001 & 0 & 000086 & 0 & 000011 \\ \hline 512 & 0 & 009905 & 0 & 000775 & 0 & 001333 & 0 & 000121 & 0 & 008824 & 0 & 001351 \\ \hline 1024 & 0 & 019467 & 0 & 001687 & 0 & 002725 & 0 & 000382 & 0 & 016637 & 0 & 001162 \\ \hline 2048 & 0 & 040434 & 0 & 006605 & 0 & 005488 & 0 & 000957 & 0 & 033939 & 0 & 004220 \\ \hline 8192 & 0 & 156327 & 0 & 010047 & 0 & 022010 & 0 & 003092 & 0 & 134952 & 0 & 011984 \\ \hline 32768 & 0 & 618616 & 0 & 010589 & 0 & 090904 & 0 & 012894 & 0 & 531491 & 0 & 019428 \\ \hline 65536 & 1 & 24506 & 0 & 021578 & 0 & 17601 & 0 & 011479 & 1 & 05494 & 0 & 017803 \\ \hline 131072 & 2 & 47669 & 0 & 027149 & 0 & 346551 & 0 & 013153 & 2 & 11318 & 0 & 031469 \\ \hline 262144 & 4 & 93434 & 0 & 035537 & 0 & 684954 & 0 & 019765 & 4 & 22327 & 0 & 072468 \\ \hline 524288 & 9 & 91757 & 0 & 117467 & 1 & 3558 & 0 & 013844 & 8 & 13143 & 0 & 15325 \\ \hline 1048586 & 19 & 6984 & 0 & 128769 & 2 & 71217 & 0 & 042001 & 16 & 3364 & 0 & 46059 \\ \hline \end{tabular} \caption{Table of time averages (in seconds) and standard deviations for benchmarking runs.} \label{tab:avgs} \end{table} As is evident from inspecting the numbers, the naive encoding, although it had to output a much larger file, worked significantly faster than both implementations of SOLE. Between the SOLE implementations it is clear that the version implemented using GMP performed slightly better than the original zero-dependencies version. This goes to show that implementing arithmetic operations, even for fairly specific purposes requires a substantial effort to outdo multi-precision libraries such as GMP. It is also interesting to inspect the standard deviations as the size of the input files grow. For both versions of SOLE this number seems to grow steadily, indicating a higher spread in processing time for different files of the same size. This could be due to the randomness, as some files would simply require more processing, but we cannot conclude this with certainty. \subsection{Conclusion} We have discussed the details of the SOLE encoding scheme, but a lot remains to be done. Decoding requires some division operations which proved difficult to implement with the design of the API, so it is only implemented in the Haskell version. Based on the data in \fref{tab:avgs}, it is clear that a more interesting treatment of this topic would have been to implement the algorithm using GMP, as this would have allowed a treatment of decoding and more interesting test cases. A lot of time was spent on implementing arithmetic operations on the custom data types (in particular division).