At the heart of the algorithm lies the notion of changing base. As motivated by the vector encoding problem, supposing we have a finite alphabet $\Sigma$ of size $B = |\Sigma|$ and wish to encode a vector $A = [1\dots n]$ of elements from $\Sigma$, the optimal space is $\lceil n \log_2{B}\rceil$ bits and we can achieve this by viewing the vector as a number in the range $[0, B^n-1]$ and converting this to binary. This is exactly a base change, although it suffers from poor locality. However, building on this idea, we will in this section introduce the SOLE encoding algorithm. \subsection{Encoding an input stream} We assume an input stream of $n$ blocks of $b$ bits and it is guaranteed that $n < 2^b$. \begin{center} \begin{tabular}{l*{5}{c}l} Input stream: & $x_1$ & $x_2$ & $x_3$ & $x_4$ & $x_5$ & $\cdots$\\ \end{tabular} \end{center} Each block $x_i$ can be viewed as a ``character'' in an alphabet consisting of $B = 2^b$ characters, i.e. $x_i\in[B]$.\footnote{Denote $[B] = \{0,\dots,B-1\}$} To indicate the end of the stream (which we might not even know when occurs), we introduce a new character \eof, which gives us an alphabet of size $B+1$. This new alphabet has the worst possible size, since a straight-forward encoding of its characters would require $2^{b+1}$ bits, but wasting half of the possible encoding space, a huge loss in entropy. The goal is therefore to \emph{efficiently} move from the new alphabet of size $B+1$ to an alphabet of size $B$, achieving a tighter encoding. \subsection{Overview of the algorithm} Conceptually the encoding consists of two phases each making a pass over the input and we present them here separately. In the following $B$ is our input alphabet size, such that $B = 2^b$. For a comprehensive overview, refer to Figure \ref{fig:sole-overview}. It is assumed in the following that $n$ is odd to obtain an even number of blocks when adding the \eof block. When $n$ is even we simply pad with an extra \eof. \begin{figure} \resizebox{.9\textwidth}{!}{\begin{minipage}{\textwidth} \begin{center} \begin{tabular}{|l|cccccccc|l|} \hline Block \# & \multicolumn{1}{c|}{1} & \multicolumn{1}{c|}{2} & \multicolumn{1}{c|}{3} & \multicolumn{1}{c|}{4} & \multicolumn{1}{c|}{5} & \multicolumn{1}{c|}{6} & \multicolumn{1}{c|}{7} & 8 & \dots \\ \hline Input alph. & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & $B$ & \dots \\ \hline With \eof & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & \multicolumn{1}{c|}{$B+1$} & $B+1$ & \dots \\ \hline Pass 1 & $B$ & \multicolumn{1}{c|}{$B+3$} & $B-3$ & \multicolumn{1}{c|}{$B+6$} & $B-6$ & \multicolumn{1}{c|}{$B+9$} & $B-9$ & $B+12$ & \dots \\ \hline Regroup & \multicolumn{1}{c|}{$B$} & $B+3$ & \multicolumn{1}{c|}{$B-3$} & $B+6$ & \multicolumn{1}{c|}{$B-6$} & $B+9$ & \multicolumn{1}{c|}{$B-9$} & $B+12$ & \dots \\ \hline Pass 2 & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & \multicolumn{1}{c|}{$B$} & $B$ & \dots \\ \hline \end{tabular} \end{center} \end{minipage} } \caption{Conceptual overview of SOLE}\label{fig:sole-overview} \end{figure} \paragraph{Pass 1} The first pass pairs input blocks $(2i, 2i+1)$ for $i = 0,1,\dots,\lceil\frac{n}{2}\rceil$ and for each of these pairs computes a number $x_i'$: \begin{align*} x_i' = x_{2i}(B+1) + x_{2i+1} \in [(B+1)^2], \end{align*} which is decomposed into two new numbers $y_i$ and $z_i$ by dividing by $(B-3i)$. Let $z_i$ denote the remainder and $y_i$ the quotient of this operation. By virtue of division we know $z_i \in [B-3i]$. $y_i$ is constrained by the ratio between $(B+1)^2$ and $(B-3i)$, which is shown in~\cite{patrascu10trits} to be at most $B+3i+3$, so $y_i\in [B+3i+3]$. After this first pass we obtain a list of alternating remainders and quotients \begin{align*} z_0, y_0, z_1, y_1, z_2, y_2,\dots, z_i, y_i \end{align*} where $z_i\in [B-3i]$ and $y_i\in[B+3i+3]$ for $i=0,\dots, \lceil\frac{n}{2}\rceil$. \paragraph{Pass 2} The second pass regroups the blocks produced in Pass 1, considering blocks $(2i, 2i+1)$ for $i\geq 1$. Notice that the very first block obtained $z_0$ is already a number in $B$, so we do not process it further, but simply output it. The remaining blocks are considered in pairs $(y_0, z_1), (y_1, z_2),\dots$ etc. The important observation here is the fact that for a given pair $(y_i, z_i)$, $y_i\in [B+3i]$ and $z\in[B-3i]$. This allows us to rewrite $(y_i, z_i)$ as a single number, and this number can be represented in $[B^2]$ as $(B-3i)(B+3i) < B^2$. A straight-forward transformation (which encodes an enumeration) is: \begin{align*} w_i = z_i(B+3i) + y_i. \end{align*} The resulting $w_i\in [B^2]$ can now easily be decomposed into two output blocks of $b$ bits by taking as the first block the upper $b$ bits from $w_i$, and for the second block, the lower $b$ bits. \paragraph{Encoding \eof} In the above we barely mentioned how \eof was to be handled. We only assumed $n$ was odd such that appending \eof would yield an even number of blocks after Pass 1, the last block being in $[B+3j]$ for some $j$. During Pass 2 of the encoding, when we reach the last block we artificially insert a zero block at the position immediately preceding it. This zero block is in the range $[B-3j]$ and can be combined with the last block and output. As mentioned in the beginning of this section, for even $n$ we append another \eof and repeat the above termination scheme. \subsection{Arithmetic operations} This section discusses the three transformations involved in the encoding, and some useful observations for an efficient implementation. We also briefly discuss the decoding strategy. The entire algorithm essentially consists of three injective mappings, two in Pass 1 (a composition and a decomposition) and one in Pass 2 (a composition). Arguably Pass 2 also performs a decomposition, by taking a value in $[B^2]$ and decomposing it to two numbers in $[B]$, but this does not have any practical implications or challenges. \paragraph{Pass 1: Composition} The first transformation takes input blocks $x_i$ and $x_{i+1}$, and composes them by computing the number: \begin{align*} x_i(B+1) + x_{i+1}. \end{align*} This composition can also be written as $x_iB + x_i + x_{i+1}$, and we can in our implementation exploit the fact that $B = 2^b$, a power of two, and perform this multiplication by shifting $x_i$ up by $b$ bits. The entire operation then boils down to a bit shift and two additions. The inverse of this operation is dividing a number $w = x_i(B+1) + x_{i+1}$ by $(B+1)$. Instead of naively dividing $w$ by $B+1$, we can observe that $w$ has the following structure: \begin{center} \begin{tabular}{r|x{3cm}|x{3cm}|}\cline{2-3} Double block ($2b$ bits) & \multicolumn{1}{c|}{$b$} & \multicolumn{1}{c|}{$b$} \tabularnewline\cline{2-3} $w=$ & $x_i + c$ & $x_i + x_{i+1} \mod{B}$ \tabularnewline\cline{2-3} \end{tabular} \end{center} where the most significant $b$ bits is $x_i$ itself plus a $c$ which is either $1$ or $0$ depending on whether or not $x_i+x_{i+1}\in [B]$. %In case $x_i + x_{i+1} > B$, the carry $c$ is set to $1$ and added In other words the ``upper'' part of $w$ is either the value $x_j+1$ or $x_j$ (depending on whether $x_j + x_{j+1}$ did or did not overflow). % In other words, the sum will at most overflow by % one bit, a carry that is added to the $(b+1)$'th position. Since $x_j$ was shifted % $b$, the ``upper'' part of $w$ is either the value $x_j+1$ or $x_j$ (depending % on whether $x_j + x_{j+1}$ did or did not overflow). Compute $x' = w \gg b$ and $y' = w - x'B$. $x'$ either equals $x_i$ or $x_i+1$. We can determine which by comparing $x'$ and $y'$. There are three cases to consider: \begin{itemize}[noitemsep] \item $x' > y'$. This indicates that $x_i + x_{i+1} > B$ and we find $x_{i}$ by subtracting one from $x'$. \item $x' = y'$. This is only possible when $x_{i+1} = 0$. In that case we have already found $x_i$ and now know $x_{i+1} = 0$. \item $x' < y'$. The addition could not have overflowed the block size $B$, and thus $x_i = x'$. \end{itemize} In the first and last case above, once we have determined the value for $x_j$, we can easily find $x_{j+1}$. \paragraph{Pass 1: Decomposition} This step decomposes a composed double-block $w$, by dividing by $B-3i$ for a given $i$. Here we require an implementation of a division of a $2b$-bit number by a $b$-bit number. The details of this operation are discussed in \ref{sec:impl}. Inverting this operation takes the obtained quotient and remainder, $q_i$ and $r_i$ respectively, and computes \begin{align*} w = q_i(B-3i) + r_i. \end{align*} This operation is similar to the composition from Pass 1, and the same observations apply; we can rewrite as $q_iB - 3i\cdot q_i + r_i$ and perform the multiplication $q_iB$ by a $b$-bit shift. But we still require a multplication operation $3i\cdot q_i$, along with both addition and subtraction of $2b$-bit numbers. \paragraph{Pass 2: Regrouped composition} \fxnote{Describe the composition (an enumeration) --- already mentioned?} From one decomposition we have a quotient $q\in[B+3i+3]$, also called ``the spill'' and we are going to combine it with a remainder from the next decomposition, $r\in[B-3(i+1)]$. The encoding step is fairly straight-forward, in the same vein as the composition step from pass 1. \fxnote{Describe the size of the output} % The decoding is in theory a division by $(B+3i)$, but we will again exploit the % structure of the composition of $w$ to obtain a faster division method.