You are right that the main two power transistors (the TIP31C and TIP32C) are driven as Darlington pairs with their respective driver transistors (2N3904 and 3906, connected to their bases).The three diodes are there to eliminate crossover distortion, and although the diagram shows them asymmetrically, they actually apply a voltage offset symmetrically to the bases of the driver transistors.There needs to be about one diode drop of voltage (0.6V) between the base and collector of a transistor to make it conduct. Without any diodes, and with a class B output stage made with single output transistors (but see below), there would be a 1.2V "dead zone" in the input where both output transistors were turned off. This would lead to horrible sound quality - human ears are really sensitive to crossover distortion. The additional voltage keeps both transistors turned on in this crossover region.
So two of the three diodes are there to supply 0.6V of additional bias voltage to each of the driver transistor bases.
The third diode adds an extra 0.6V so that there is a constant quiescent current through the output emitter resistors (1.2 ohm). These resistors serve three purposes.
First, they set the quiescent current, which would be 0.6V / 2.4 ohm = 250mA with no input signal.Second, they act as stabilising negative feedback and prevent thermal runaway.Third, they act as sensing resistors for output current limiting.This is where the final two transistors come in. If the voltage across the emitter resistors gets above about 0.6V each (1.2V total), they turn on and apply strong negative feedback. This happens if the output current gets too high (about 500mA).Negative feedback is applied to the op-amp from the output terminal via the 10k resistor, which should compensate for any non-linearity in the output driver.
But! But! But! The three-diode arrangement is a classic biasing circuit for SINGLE output transistors. A Darlington pair has TWO base-to-emitter voltage drops, so I would expect there to be five diodes - one each to bias the base-to-emitter voltages of the four output transistors, and one to bias the emitter resistors. Am I missing something?