Quantum Machine Learning

Quantum machine learning is the extension of machine learning to the realm of quantum computation. Still in it's infancy (as are quantum computers), QML is a cutting edge field with, as of yet, undecided foundations. Several different approaches exist, including quantum kernels, support vector machines, and quantum neural networks. Currently implementable and effective models tend to utilize hybrid schemes, wherein classical machine learning techniques interact with quantum submodules to effectively get the best of both worlds (and as a result of limited QC hardware). The overview given below is encapuslated in this presentation.

Quantum Neural Networks

One proposed framework for quantum neural networks is that discussed in Kerstin Beer, et al's Training Deep Quantum Neural Networks, and the formalism described and discussed below. Note that this formalism works with states represented as density matrices (neccessarily since we intend to take partial traces over hidden layers).

Overview of Framework

In analogy to the the classical case of neural networks, we essentially have to redefine four key components to define a working model: first, we must specify the appropriate form of data; then, some general architecture that can act and be trained on this data; some way to quantitatively judge a given model's performance on the training data; and finally, some way to accomplish this training.

Data

The most general task a quantum circuit can do is perform some arbitrary unitary operation on an arbitrary dimensional space. Hence, an effective quantum neural network should aim to approximate an arbitrary unitary given some set of data \( (\vert\psi_i\rangle, V \vert \psi_i\rangle) \) where \(i\) ranges from 1 to \(N\) for \(N\) points of data. Effectively, then, our quantum network should be able to 'learn' to approximate what unitary operation is being performed given \(N\) 'snapshots' of it's action on some set of states.

Architecture

The overall action of the network is composed of layer-by-layer composition of the transition map \(\epsilon^{\ell}\) for each layer \(\ell\) s.t. \(in\leq \ell\leq out\). Each layer may have a different number of qubits \(m_{\ell}\). Explicitly, the \(\ell\)-th layer's transition map takes the form: \[ \epsilon^{\ell-1}(\rho_{\ell-1})= \text{Tr}_{\ell-1}\left[ \left(\prod_{m=1}^{m_{\ell}}U^{\ell}_{m}\right)\left(\left(\vert 0\rangle ^{\otimes m_{\ell}}\langle 0\vert^{\otimes m_{\ell}}\right)_{\ell}\otimes \rho_{\ell-1}\right) \left(\prod_{m=m_{\ell}}^{1}U^{\ell\dagger}_{m}\right) \right] = \rho_{\ell} \] And, hence, a total circuit of \(L\) layers returns \(\rho_{out}\), defined below, for some given input state \(\rho_{in}\). \[ \rho_{out} = \epsilon^{out}\left(\epsilon^{L}\left(\epsilon^{L-1}\left( ... \epsilon^{1}\left(\rho_{in}\right) ...\right)\right)\right) \]

Architecture: Step-by-step

1. The next layer's \(m_{\ell}\) qubits are prepared in the initial state \(\vert 0\rangle ^{\otimes m_{\ell}}\langle 0\vert^{\otimes m_{\ell}}_{\ell}\) and tensor producted with the previous layer's output \(\rho_{\ell-1}\). \[ \rho_{\ell}' = \left(\vert 0\rangle ^{\otimes m_{\ell}}\langle 0\vert^{\otimes m_{\ell}}\right)_{\ell}\otimes \rho_{\ell-1} \] 2. The \(\ell\)-th layer's \(m_{\ell}\) associated unitary matrices \(U^{\ell}_m\) are applied to this tensor product state (from top to bottom). \[ \rho_{\ell}'' = \left(\prod_{m=1}^{m_{\ell}}U^{\ell}_{m}\right)\left(\rho_{\ell}'\right) \left(\prod_{m=m_{\ell}}^{1}U^{\ell\dagger}_{m}\right) \] 3. The partial trace over the \( (\ell-1)\)-th layer's Hilbert space is taken, resulting in the output state \(\rho_{\ell}\) of the \(\ell\)-th layer. \[ \rho_{\ell} = \text{Tr}_{\ell-1} [\rho_{\ell}''] \]

Cost

We now need a metric by which we may judge the performance of the network on the training data.

The cost is then defined as the fidelity averaged over the output states and the corresponding training data states, as below: \[C =\frac{1}{N}\sum_{i=1}^{N}\langle \psi_i\vert V^{\dagger} \rho_i^{out}V\vert\psi_i\rangle \] Note that, since fidelity is one when two states correspond exactly, we wish to maximize this cost function (as opposed to the usual case in classical machine learning where we wish to minimize the corresponding cost function).

Further, note that this cost function is only applicable for training data based on pure states, for which the fidelity takes an especially nice form. For data based on mixed states, we may replace the above with an averaged fidelity between output and target states. Explicitly then, for this case, we have the following: \[ C=\frac{1}{N}\sum_{i=1}^{N}\left(\text{Tr}\left[\sqrt{\sqrt{\rho_i}\rho^{out}_i\sqrt{\rho_i}}\right]\right)^2 \]

Training

The issue now is training, which can be done iteratively by updating the network's constituent unitaries via the following relation, parametrized by the step size \(\epsilon\): \[U_{m}^{\ell} \rightarrow e^{-\epsilon K_{m}^{\ell}}U_{m}^{\ell}\]

Adjoint Layer

The adjoint layer is analagous to the backward pass of a classical neural network, and is neccessary for the computation of the training matrix (atleast for the form given above).

\(2\times 3\times 2\) Qubit QNN

Our simple example of the model will consist of a two-qubit input state, one three-qubit hidden layer, and a two-qubit output state, closely following the presentation given by Ramona Wolf.

Forward Pass

\[ \rho_{in} \rightarrow \rho_{out}= \text{Tr}_{1}\Big[ U^{2}_2U^{2}_1\Big(\text{Tr}_{in}\left[U^1_3U^{1}_2U^{1}_1\left(\rho_{in}\otimes \vert 000\rangle\langle000\vert_{1}\right)U^{1\dagger}_1U^{1\dagger}_2U^{1\dagger}_3\right]\otimes \vert 00\rangle\langle00\vert_{2}\Big)U^{2\dagger}_1U^{2\dagger}_2 \Big] \]

Training Matrices

\[ K^1_1 =\eta\frac{2^2}{N}\sum_{i=1}^N \text{Tr}_{2,3_{int}}\left[U_1^1\left(\rho^{in}_i\otimes \vert 000\rangle\langle000\vert_1\right)U^{1\dagger}_1,U^{1\dagger}_2U^{1\dagger}_3\left(\mathbb{I}_{2^2}\otimes\sigma_i^1\right)U^{1}_3U^{1}_2\right] \] \[ K^1_2 = \eta\frac{2^2}{N}\sum_{i=1}^N \text{Tr}_{1,3_{int}}\left[U_2^1U_1^1\left(\rho^{in}_i\otimes \vert 000\rangle\langle000\vert_1\right)U^{1\dagger}_1U^{1\dagger}_2,U^{1\dagger}_3\left(\mathbb{I}_{2^2}\otimes\sigma_i^1\right)U_3^1\right] \] \[ K^1_3 = \eta\frac{2^2}{N}\sum_{i=1}^N \text{Tr}_{1,2_{int}}\left[U_3^1U_2^1U_1^1\left(\rho^{in}_i\otimes \vert 000\rangle\langle000\vert_1\right)U^{1\dagger}_1U^{1\dagger}_2U^{1\dagger}_3,\left(\mathbb{I}_{2^2}\otimes\sigma_i^1\right)\right] \]

\[ K^{out}_1 =\eta\frac{2^3}{N}\sum_{i=1}^N \text{Tr}_{2_{out}}\left[U_1^{out}\left(\rho^{int}_i\otimes \vert 00\rangle\langle00\vert_{out}\right)U^{out\dagger}_1,U^{out\dagger}_2\left(\mathbb{I}_{2^3}\otimes\rho_i^{out}\right)U^{out}_2\right] \] \[ K^{out}_2 =\eta\frac{2^3}{N}\sum_{i=1}^N \text{Tr}_{1_{out}}\left[U^{out}_2U_1^{out}\left(\rho^{int}_i\otimes \vert 00\rangle\langle00\vert_{out}\right)U^{out\dagger}_1U^{out\dagger}_2,\left(\mathbb{I}_{2^3}\otimes\rho_i^{out}\right)\right] \]

where \(\rho^{int}_i = \varepsilon^{1}(\rho^{in}_i)\), \(\rho^{out}_i = \varepsilon^{out}(\varepsilon^{1}(\rho^{in}_i))\), and \(\sigma_i^1=\mathcal{F}^{out}(\rho^{out}_i)\), with \(\rho^{out}_i\) again being from the training data.

Adjoint Layer

\[ \mathcal{F}^{out}(\rho_{out}) = \text{Tr}_{out}\left[ \left(\vert 00\rangle\langle 00 \vert_{out} \otimes \mathbb{I}_{2^3} \right) U^{out\dagger}_1 U^{out\dagger}_2 (\rho_{out}\otimes\mathbb{I}_{2^3})U^{out}_2 U^{out}_1 \right] =\rho_{int} \]

Simulation & Resources

For a simulated version of this example network in QuTip, download this jupyter notebook or see this report.

Quantum Machine Learning

Quantum Neural Networks

Overview of Framework

Data

Architecture

Architecture: Step-by-step

Cost

Training

Note: Discrepancy with Original Paper

Adjoint Layer

\(2\times 3\times 2\) Qubit QNN

Forward Pass

Training Matrices

Adjoint Layer

Simulation & Resources