Quantum Neural Networks
There are the contents of this chapter:
- Building and training quantum neural networks
- Quantum neural networks in PennyLane
- Quantum neural networks in Qiskit
Building and training a quantum neural network
From classical neural networks to quantum neural networks
Convolutional Neurak Networks | Quantum Neurak Networks | |
---|---|---|
Data Pre-processing | Data normalizing or scaling | Encode classic data via feature map, normalize if needed |
Data processing | Feeding the data via a sequence of layers | Use variational form (ansatzs) to resembles the spirit of a classical neural network |
Data Output | Retrun the output via a final layer | The result of some measurement operation (whichever suits our problem best) |
In fact, both feature map and variational form (ansatzs) are both examples of variational circuit: quantum circuits that are controlled by some classical parameters.
- Feature map: depend on the input data and are used to encode it.
- Variational form (ansatzs): depend on optimizable parameters and are used to transform a quantum input state.

Quantum neural networks flow. Where \( F \) us a feature map, variational form \( V \) depends on some optimizable parameters \( \overrightarrow{\theta} \). The output of the quantum neural network is the result of a measurement opeartion on the final state.
Variational forms
In principle, variational forms for QNNs follow a "layered structured," trying to mimic the spirit of classical neural networks.
Let's first define a variational form with k layers, we could consider k vectors of independent parameters \overrightarrow{\theta_{1}}, \cdots, \overrightarrow{\theta_{k}}. To define each layer j, we take variational circuit G_j depend on the parameters \overrightarrow{\theta_{j}}. A common approach is to prepare variational forms by stacking these variational circuits consecutively and separating them by some circuit U_{ent}^{t}, inpedendent of any parameters, meant to create entanglement between the qubits.

A variational form with \(k\) layers, each defined by a variational circuit \( G_j \) dependent on some parameters \( \overrightarrow{\theta_{j}} \). The circuits \( U_{ent}^{t} \) are used to create entanglement, and the state \( | \psi_{enc} \rangle\) denotes the output of the feature map
Two-local
The two-local form with k repetitions on n qubits relies on n \times (k+1) optimizable parameters, which we will denote as \theta_{rj} with r = 0, \cdots, k and j=1,\cdots,n. The two-local circuit is constructed as
# TWOLOCAL (n, k, theta)
for r in range(k):
# DO: add the r-th layer.
for j in range(1, n+1): # for all t = 1,...,n
# DO: Apply a Ry(theta_{rj}) gate on qubit j
# DO: create entanglement between layers
if r < k:
for t in range(1, n): # for all t = 1,...,n-1
# DO: Apply a CNOT gate with control on qubit t and target on qubit t+1
Below figure shows n = 4 and k=3 in the two-local method. The two-local variational form uses the same circuit as the angle encoding feature map for its layers, and then it relies on a cascade of contrlled-NOT operations in order to create entanglement (see CNOT gates after each R_{Y}(\theta_{rj}) implementation). Also, the tow-local variational form with k repetitions has k+1 layers, not k. The two-local variational form is very versatile, and it can be used with any measurement operation.
For a two-local variational form, there are more options for the distribution of gates in entanglement circuit besides the "linear" model we have just covered. See the below figure for circular and full entangelment circuits diagram.
Tree tensor
The tree tensor variational form with k+1 layers can be applied on n = 2^{k} qubits. The variational form relies on 2^{k}+2^{k-1}+2^{k-2}+ \cdots + 1 optimizable parameters of the form
The tree tensor circuit is constructed as
# TREETENSOR (k, theta)
# on each qubit j, apply a rotation Ry(theta_0j).
for r in range(1,k+1): # for all r = 1,..., k
for s in range(2^{k-r}): #for all s = 0,...,2^{k-r}-1 do
# DO: Apply a CNOT operation with target on qubit 1+s2^{r} and controlled by qubit 1 + s2^{r} + 2^{r-1}
# DO: Apply a rotation Ry(theta_{r,s}) on qubit 1+s2^{2}.
The tree tensor variational form fits best in quantum neural networks designed to work as binary classifiers. The most natural measurement operation that can be used in conjuction with it is the obtention of the expected value of the first qubit, as measuremented in the computational basis.
Strongly entangling layers
The strongly entangling layers variational form acts on n qubits anc can have any number k of layers. Each layer l is given a range r_l. The variational form uses 3nk parameters of the form
The strongly entangling layer circuit is constructed as
# STRONGLYENTANGLINGLAYERS(n,k,r,theta)
for l in range(1,l+1): # for all l = 1,..., l
for j in range(1,j+1): # for all j = 1,..., j
# DO: Apply a rotation RZ(theta_{l,j,1}) on qubit j.
# DO: Apply a rotation RY(theta_{l,j,2}) on qubit j.
# DO: Apply a rotation RZ(theta_{l,j,3}) on qubit j.
for j in range(1,j+1): # for all j = 1,..., j
# DO: Apply a CNOT operation controlled by qubit j and with target on qubit [(j+r_l - 1 mod N)] + 1.
Tge choice to use mostly Y rotations in the previouse examples of variational forms is somewhat arbitrary. We can also use X rotation. And we can also have used a different controlled operation.
Measurements
From VQE, we know any physical observable can be represented by a Hermitian operator in such a way that all the possible outcomes of the measurement of the observable can be matched to the different eigenvalues of the operator.
When we measure a single qubit in the computational basis, the coordinate matrix with respect to the computational basis of the associated Hermitian operator could well be either of
Both of these operators represent the measurement of a qubit, but they differ in the eigenvalues that they associate to the distinct outputs.
- The M operator associate with eigenvalues 1 and 0 to the qubit's value being 0 and 1 respectively.
- The Z operator associate with eigenvalues 1 and -1 to the qubit's value being 0 and 1 respectively.
Practice
Show that operatoin M and Z can be written as $$ M = 1|0\rangle \langle 0 | + 0|1\rangle\langle1| = |1\rangle\langle1|, \quad Z = |0\rangle\langle0|-|1\rangle\langle1| $$
Answer
As we know, PennyLane allow you to work with measurement operations defined by any Hermitian operator.
In an n-qubit circuit, you will be able to instruct PennyLane to compute the expectation value of the obervable M \otimes \cdots \otimes M, which has as its coordinate repersentation in the copmutational basis the matrix
You can also consider the observable Z \otimes \cdots \otimes Z. This observable will return +1 is an even number of qubits are measured as 0, and -1 otherwise. The Z \otimes \cdots \otimes Z operation is also called as a partiy observable.
Gradient computation and the parameter shift rule
After we have all the building blocks, let's see how we can train our QNN model! To do so, we have to bring back our old friend - optimization algorithm.
The optimization algorithm that we shall use for quantum neural networks will be gradient descen algorithm, in particular, the Adam optimizer. The Adam optimizer in quantum optimization needs to obtain the gradient of the expected value of a loss function in terms of the optimizable parameters.
Numerical approximation
If we had a real-valued function taking n real inputs f: \mathbb{R}^{n} \rightarrow \mathbb{R}, we approximate its partial dervatives as
where h is a sufficiently small value.
Automatic differentiation
Given the current state of real quantum hardware, odds are that most of the quantum neural networks that you will train will run on simulators.
The parameter shift rule
By using the parameter shift rule, you can compute gradients when executing quantum neural networks on real quantum hardware. This technique enables us to compute gradients by using the same circuit in the quantum neural network, yet shifting the values of the optimizable parameteres. This technique cannot always be applied, but it works on many common cases and can be used in conjuction with other techniquesm such as numerical approximation.
For example, if you had a circuit consisting of a single rotation gate R_{X}(\theta) and the measurement of its expectation value E(\theta), you would be able to compute its derivative with respect to \theta as
This is a analogy to the trigonometric functions such as a derivative of the sine function in terms of shifted values of the same sine function.
Note
When quantum neural networks are run on simulators, gradients can be computed using automatic differentiation techniques analogous to those of classical machine learning. Alternatively, numerical approximation is always an effective way to compute gradients.
Note
Everything looks good and promising, but quantum neural networks also pose some challenges when it comes to training them.
- They are known to be vulnerable to barren plateaus: situations in which the training gradients vanish and, thus, the training can no longer progress (see the paper by McClean et. al for further explanation). [2]
- It is also known that the kind of measurement operation used and the depth of the QNN play a role in how likely these barren plateaus are to be found. (in a paper by Cerezo and collaborators) [3]
- In any case, you should be vigilant when training your QNNs, and follow the literature for possible solutions should barren plateaus threaten the learning of your models
Practical usage of quantum neural networks
Here are a collection of ideas that you should keep in mind when designing QNN models and training them.
-
Make a wise choice
Choose your feature map, variational form, and measurement operation properly. Be intentional about these choices and consider the porblem and the data taht you are working with. Your decision can lead to barren plateaus. Try to build a case from a well-established cases in literature.
-
Size matters
When you use a well-designed variational form, such as two-local, tree tensor, or strongly entanging layers, the power of the resulting quantum neural network will be directly related to the number of optimizable parameters it has
-
Optimize optmization
For most problems, the Adam optimizer can be your go-to choice for training a quantum neural network
-
Feed your QNN properly
The data that is fed to a quantum neural network should be normalized according to the requirements of the feature map in use. Also, depends on the complexity of the problem, considering using dimensionality reduction techniques.
If you want to boost the power of your QNN, you amy consider using data reuploading technique [4].In QNN, you have a feature map F dependendt on some data \overrightarrow{x}, which is then followed by a variational form V dependent on some optimizable parameters \overrightarrow{\theta_{x}} - any number of times you want - before performing the measurement opeartion of the QNN.
This has been shown, both in practice and in theory [5], to offer some advantages over the simpler, standard approach at the cost of increasing the depth of the circuits that are used.
Reference
- Combarro, E. F., & González-Castillo, S. (2023). A practical guide to quantum machine learning and quantum optimisation: Hands-on approach to modern quantum algorithms. Packt Publishing.
- M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, “Cost function dependent barren plateaus in shallow parametrized quantum circuits,” Nature communications, vol. 12, no. 1, pp. 1–12, 2021.
- M. C. Caro, H.-Y. Huang, M. Cerezo, et al., “Generalization in quantum machine learning from few training data,” Nature Communications, vol. 13, no. 1, p. 4919, 2022.
- A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I. Latorre, “Data re-uploading for a universal quantum classifier,” Quantum, vol. 4, p. 226, Feb. 2020.
- M. Schuld, R. Sweke, and J. J. Meyer, “Effect of data encoding on the expressive power of variational quantum-machine-learning models,” Phys. Rev. A, vol. 103, p. 032 430, 3 Mar. 2021.