At the heart of the algorithm lies the notion of changing base. As motivated by
the vector encoding problem, supposing we have a finite alphabet $\Sigma$ of
size $B = |\Sigma|$ and wish to encode a vector $A = [1\dots n]$ of elements
from $\Sigma$, the optimal space is $\lceil n \log_2{B}\rceil$ bits and we can
achieve this by viewing the vector as a number in the range $[0, B^n-1]$ and
converting this to binary.

This is exactly a base change, although it suffers from poor locality. However,
building on this idea, we will in this section introduce the SOLE encoding
algorithm.

\subsection{Encoding an input stream}

We assume an input stream of $n$ blocks of $b$ bits and it is guaranteed that $n
< 2^b$.
\begin{center}
	\begin{tabular}{l*{5}{c}l}
	Input stream: & $x_1$ & $x_2$ & $x_3$ & $x_4$ & $x_5$ & $\cdots$\\
	\end{tabular}
\end{center}

Each block $x_i$ can be viewed as a ``character'' in an alphabet consisting of
$B = 2^b$ characters, i.e. $x_i\in[B]$.\footnote{Denote $[B] = \{0,\dots,B-1\}$}

To indicate the end of the stream (which we might not even know when occurs), we
introduce a new character \eof, which gives us an alphabet of size $B+1$. This
new alphabet has the worst possible size, since a straight-forward encoding of
its characters would require $2^{b+1}$ bits, but wasting half of the possible
encoding space, a huge loss in entropy. The goal is therefore to
\emph{efficiently} move from the new alphabet of size $B+1$ to an alphabet of
size $B$, achieving a tighter encoding.

\subsection{Overview of the algorithm}

Conceptually the encoding consists of two phases each making a pass over the
input and we present them here separately. In the following $B$ is our input
alphabet size, such that $B = 2^b$. For a comprehensive overview, refer to
Figure \ref{fig:sole-overview}.

It is assumed in the following that $n$ is odd to obtain an even number of
blocks when adding the \eof block. When $n$ is even we simply pad with an extra
\eof.

\begin{figure}
\resizebox{.9\textwidth}{!}{\begin{minipage}{\textwidth}
\begin{center}
\begin{tabular}{|l|cccccccc|l|}
\hline
Block \#    & \multicolumn{1}{c|}{1}   & \multicolumn{1}{c|}{2}   & \multicolumn{1}{c|}{3}   & \multicolumn{1}{c|}{4}   & \multicolumn{1}{c|}{5}   & \multicolumn{1}{c|}{6}   & \multicolumn{1}{c|}{7}   & 8 & \dots    \\ \hline
Input alph. & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & $B$ & \dots    \\ \hline
With \eof   & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & $B+1$ & \dots  \\ \hline
Pass 1      & $B$                        & \multicolumn{1}{c|}{$B+3$} & $B-3$                      & \multicolumn{1}{c|}{$B+6$} & $B-6$                      & \multicolumn{1}{c|}{$B+9$} & $B-9$                      & $B+12$ & \dots \\ \hline
Regroup     & \multicolumn{1}{c|}{$B$}   & $B+3$                      & \multicolumn{1}{c|}{$B-3$} & $B+6$                      & \multicolumn{1}{c|}{$B-6$} & $B+9$                      & \multicolumn{1}{c|}{$B-9$} & $B+12$ & \dots \\ \hline
Pass 2      & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & \multicolumn{1}{c|}{$B$}   & $B$ & \dots    \\ \hline
\end{tabular}
\end{center}
\end{minipage} }
\caption{Conceptual overview of SOLE}\label{fig:sole-overview}
\end{figure}

\paragraph{Pass 1}
The first pass pairs input blocks $(2i, 2i+1)$ for $i =
0,1,\dots,\lceil\frac{n}{2}\rceil$ and for each of these pairs computes a number
$x_i'$:
\begin{align*}
	x_i' = x_{2i}(B+1) + x_{2i+1} \in [(B+1)^2],
\end{align*}
which is decomposed into two new numbers $y_i$ and $z_i$ by dividing by
$(B-3i)$. Let $z_i$ denote the remainder and $y_i$ the quotient of this
operation. By virtue of division we know $z_i \in [B-3i]$. $y_i$ is constrained
by the ratio between $(B+1)^2$ and $(B-3i)$, which is shown
in~\cite{patrascu10trits} to be at most $B+3i+3$, so $y_i\in [B+3i+3]$.

After this first pass we obtain a list of alternating remainders and quotients
\begin{align*}
z_0, y_0, z_1, y_1, z_2, y_2,\dots, z_i, y_i
\end{align*}
where $z_i\in [B-3i]$ and $y_i\in[B+3i+3]$ for $i=0,\dots, \lceil\frac{n}{2}\rceil$.

\paragraph{Pass 2}
The second pass regroups the blocks produced in Pass 1, considering blocks $(2i,
2i+1)$ for $i\geq 1$. Notice that the very first block obtained $z_0$ is already
a number in $B$, so we do not process it further, but simply output it. The
remaining blocks are considered in pairs $(y_0, z_1), (y_1, z_2),\dots$ etc.

The important observation here is the fact that for a given pair $(y_i, z_i)$,
$y_i\in [B+3i]$ and $z\in[B-3i]$. This allows us to rewrite $(y_i, z_i)$ as a
single number, and this number can be represented in $[B^2]$ as $(B-3i)(B+3i) <
B^2$. A straight-forward transformation (which encodes an enumeration) is:
\begin{align*}
	w_i = z_i(B+3i) + y_i.
\end{align*}
The resulting $w_i\in [B^2]$ can now easily be decomposed into two output blocks
of $b$ bits by taking as the first block the upper $b$ bits from $w_i$, and for
the second block, the lower $b$ bits.

\paragraph{Encoding \eof}

In the above we barely mentioned how \eof was to be handled. We only assumed $n$
was odd such that appending \eof would yield an even number of blocks after
Pass 1, the last block being in $[B+3j]$ for some $j$.

During Pass 2 of the encoding, when we reach the last block we artificially
insert a zero block at the position immediately preceding it. This zero block is
in the range $[B-3j]$ and can be combined with the last block and output.

As mentioned in the beginning of this section, for even $n$ we append another
\eof and repeat the above termination scheme.

\subsection{Arithmetic operations}

This section discusses the three transformations involved in the encoding, and
some useful observations for an efficient implementation. We also briefly
discuss the decoding strategy.

The entire algorithm essentially consists of three injective mappings, two in
Pass 1 (a composition and a decomposition) and one in Pass 2 (a
composition). Arguably Pass 2 also performs a decomposition, by taking a value
in $[B^2]$ and decomposing it to two numbers in $[B]$, but this does not have
any practical implications or challenges.

\paragraph{Pass 1: Composition}
The first transformation takes input blocks $x_i$ and $x_{i+1}$, and composes
them by computing the number:
\begin{align*}
  x_i(B+1) + x_{i+1}.
\end{align*}
This composition can also be written as $x_iB + x_i + x_{i+1}$, and we can in
our implementation exploit the fact that $B = 2^b$, a power of two, and perform
this multiplication by shifting $x_i$ up by $b$ bits. The entire operation then
boils down to a bit shift and two additions.

The inverse of this operation is dividing a number $w = x_i(B+1) + x_{i+1}$ by
$(B+1)$. Instead of naively dividing $w$ by $B+1$, we can observe that $w$ has
the following structure:

\begin{center}
\begin{tabular}{r|x{3cm}|x{3cm}|}\cline{2-3}
Double block ($2b$ bits) & \multicolumn{1}{c|}{$b$} & \multicolumn{1}{c|}{$b$} \tabularnewline\cline{2-3}
$w=$                   & $x_i + c$                & $x_i + x_{i+1} \mod{B}$   \tabularnewline\cline{2-3}
\end{tabular}
\end{center}
where the most significant $b$ bits is $x_i$ itself plus a $c$ which is either
$1$ or $0$ depending on whether or not $x_i+x_{i+1}\in [B]$.

%In case $x_i + x_{i+1} > B$, the carry $c$ is set to $1$ and added
In other words the ``upper'' part of $w$ is either the value $x_j+1$ or $x_j$
(depending on whether $x_j + x_{j+1}$ did or did not overflow).


% In other words, the sum will at most overflow by
% one bit, a carry that is added to the $(b+1)$'th position. Since $x_j$ was shifted
% $b$, the ``upper'' part of $w$ is either the value $x_j+1$ or $x_j$ (depending
% on whether $x_j + x_{j+1}$ did or did not overflow).

Compute $x' = w \gg b$ and $y' = w - x'B$. $x'$ either equals $x_i$ or
$x_i+1$. We can determine which by comparing $x'$ and $y'$. There are three
cases to consider:
\begin{itemize}[noitemsep]
	\item $x' > y'$. This indicates that $x_i + x_{i+1} > B$  and
		we find $x_{i}$ by subtracting one from $x'$.
	\item $x' = y'$. This is only possible when $x_{i+1} = 0$. In that
		case we have already found $x_i$ and now know $x_{i+1} =
		0$.
	\item $x' < y'$. The addition could not have overflowed the block size $B$,
		and thus $x_i = x'$.
\end{itemize}
In the first and last case above, once we have determined the value for
$x_j$, we can easily find $x_{j+1}$.

\paragraph{Pass 1: Decomposition}

This step decomposes a composed double-block $w$, by dividing by $B-3i$ for a
given $i$. Here we require an implementation of a division of a $2b$-bit number
by a $b$-bit number. The details of this operation are discussed in \ref{sec:impl}.

Inverting this operation takes the obtained quotient and remainder, $q_i$ and
$r_i$ respectively, and computes
\begin{align*}
  w = q_i(B-3i) + r_i.
\end{align*}
This operation is similar to the composition from Pass 1, and the same
observations apply; we can rewrite as $q_iB - 3i\cdot q_i + r_i$ and perform the
multiplication $q_iB$ by a $b$-bit shift. But we still require a multplication
operation $3i\cdot q_i$, along with both addition and subtraction of $2b$-bit
numbers.


\paragraph{Pass 2: Regrouped composition}
\fxnote{Describe the composition (an enumeration) --- already mentioned?}
From one decomposition we have a quotient $q\in[B+3i+3]$, also called ``the
spill'' and we are going to combine it with a remainder from the next
decomposition, $r\in[B-3(i+1)]$.

The encoding step is fairly straight-forward, in the same vein as the composition
step from pass 1.
\fxnote{Describe the size of the output}


% The decoding is in theory a division by $(B+3i)$, but we will again exploit the
% structure of the composition of $w$ to obtain a faster division method.