Let’s start with one particle in one dimension. If we somehow know the position and the momentum of that particle at any time, we can tell about its position and momentum at other times. This is well established postulate under classical mechanics. What we further do is quantize the position and the momentum into discrete steps of \(\delta x\) and \(\delta p\). Doing this will let us define some quantities which we will explore later. For now we can give the description of this system in terms of two numbers representing the position and the momentum. These are called the quantum numbers that characterize the state of the system and these give the complete description of the system, such that applying the commonly known equations of motion we can know the state of the system or the quantum numbers at any time. This generalizes to any number of particles and dimensions, for example for a particle in three dimensional space, we can describe it in six quantum numbers.
Let’s now take an example of a particle under simple harmonic oscillations. The energy of the system being constant \(E\) which mind you is also discrete so it’s a value between \(E\) and \(E+\delta E\) classically. Knowing the \(x, p\) and \(E\) of the system we know all the quantum parameters. If we plot \(x\) vs \(p\) for energies between \(E\) and \(E+\delta E\) we get two ellipses for \(E\) and \(E+\delta E\) .If we overlay a grid of cells with spacing \(\delta x\) and \(\delta p\), the number of cells lying within the area between the two ellipses gives a count of the accessible quantum states for energy between \(E\) and \(E+\delta E\). To make use of all this, we need a basis postulate which we can follow to make predictions with. Such a postulate is called an a-priori axiom can we assume it to be true in order to make predictions off of it. Of course we don’t know if the axiom holds or not, we can only verify the predictions that we can make using it. If the predictions hold repeatedly, we can with further confidence believe that the axiom indeed holds. One of the fundamental axioms is “An isolated system in equilibrium is equally likely to be in any of it’s accessible states irrespective of time”. There’s nothing in the laws of mechanics that can contradict this statement and it is a well established axiom in statistical mechanics. Lets make a simple prediction. Every state inside the thickness of the ellipse is accessible to the system, hence called an accessible state. According to our postulate the system is equally likely to be in any of that accessible state. It follows that if corresponding to a particular value of \(x\), say \(x_{0}\) if there are more accessible states compared to other value of \(x\) say \(x_{1}\). Then the system is more likely have the quantum number \(x_{0}\) than the quantum number \(x_{1}\). From the graph, near \(x=0\) there are fewer states accessible to system compared to near the edge. Thus we conclude the particle is more likely to be near the edge than the center. This makes sense intuitively, the particle moves more slowly near the extreme positions and therefore spends more time there.
It’s true that we can predict the states if we have access to all the quantum numbers available to us, but for sufficiently large systems such data is not of interest to us as a statistical viewpoint. Instead of describing system by it’s quantum parameters, we describe them using external parameters like: total energy, volume, density, pressure etc. These parameters together is referred to as the macrostate of the system. As one can imagine, there are several possible sets of quantum numbers associated with a singe macrostate, the set is called the microstate of the system. From our initial postulate we can say that a system defined by a macrostate will have equal probability of being in each microstate linked to that macrostate. From these definitions we define \(\Omega(E)\) as the number of microstates available to system when total energy is between \(E\) and \(E+\delta E\). Similarly \(\Omega(E; y_{1}\dots y_{n})\) as the number of microstates available to the system subject to external parameters \(y_{1}\dots y_{n}\). Not going into detailed mathematical analysis, we state that \(\Omega(E)\) scales proportionally to \(E^{f}\) where \(f\) is the number of quantum parameters. To grasp this, we turn to the simple harmonic oscillator. It’s obvious that for larger values of \(E\) the size of the ellipse will be larger, increasing the microstates, roughly by the factor of \(E^{2}\). As the number of dimensions and particles increase, the number of microstates increase exponentially. For a mole of particles in three dimensions, this scales proportionally to \(E^{1\times 10^{24}}\) , which is hugely exponential.
If a macrostate is defined by \(E_{0}; y_{1}\dots y_{k}\dots y_{n}\) and some constraint \(y_{k}\) is allowed to change, then the probability that \(y_{k}=y\) is \(\mathbf{P}(y)\). This probability is proportional to the number of microstates available to the system when \(y_{k}=y\). If we put no restrictions on \(y_{k}\) naturally it’ll assume a value \(y\) where \(\mathbf{P}(y)\) maximizes. Hence we conclude that when a constraint is lifted the system assumes the value of external parameter where that parameters leads to maximum microstates. Let us take two systems \(A\) and \(A'\) each in equilibrium. The system as a whole \(A^{\langle 0 \rangle}\) is isolated and hence the total energy \(E^{\langle 0 \rangle}\) of the system is conserved. If the subsystems \(A\) and \(A'\) are allowed to interact such that their external parameters \(y_{1}\dots y_{n}\) are fixed only the energies of each system are allowed to vary. Such statistical ensembles are called canonical ensembles. The energy of each \(A\) in ensemble of many systems does not change by the same amount. So we describe the situation in terms of mean energy over all the systems. More concisely, \(\Delta \bar{E}=-\Delta \bar{E}'\) such that total energy change is zero. We call this type of energy change heat absorbed by the system \(Q\). Another type of interaction is when heat transfer is not possible, instead the external parameters of the subsystems are allowed to vary. In this case too, we describe the interaction in terms of change in average energy over ensemble of many subsystems. This time we call it the net work done on the system \(\mathcal{W}=\Delta \bar{E}\). The work done by the system is then \(W=-\mathcal{W}\). For a general interaction involving both heat exchange and work done we have the expression. \(Q=\Delta \bar{E}+W\), which is equivalent to the first law of thermodynamics.
We study the thermal interactions in the system \(A^{\langle 0 \rangle}\) in more detail. If the total energy of each subsystem be \(E\) and \(E'\) respectively, we have \(E=E^{\langle 0 \rangle}-E'\). The total number of accessible microstates of the system \(A^{\langle 0 \rangle}\) is the given by the product of accessible microstates of the subsystems \(A\) and \(A'\) . For each accessible microstate of \(A\) the combined system has the choice of being in any accessible microstate of \(A'\) . Hence, \(\Omega^{\langle 0 \rangle}(E)=\Omega(E)\Omega'(E')=\Omega(E)\Omega'(E^{\langle 0 \rangle}-E)\propto \mathbf{P}(E)\). Hence the systems will assume the value of \(E\) and consequently \(E'\) such that the number of accessible microstates is maximized. We wish find the most probable energy distribution of the subsystems, this can be analytically achieved by solving \(d\mathbf{P} /dE =0\). \[\begin{align} \mathbf{P}\propto\Omega \cdot\Omega'\,\text{ and }\,\Omega \propto E^{ f}\,\text{ and } \,\Omega'\propto E'^{f'} \\ \implies \mathbf{P}\propto E^{f}\cdot E'^{f'} \\ \implies \ln \mathbf{P}=f\ln E+f'\ln E'+c \\ \text{Function maximizes when}\, \\ \frac{ \partial \ln \Omega(E) }{ \partial E } -\frac{ \partial \ln \Omega'(E') }{ \partial E' } = 0 \\ \implies \frac{ \partial \ln \Omega(E) }{ \partial E } =\frac{ \partial \ln \Omega'(E') }{ \partial E' } \\ \implies\beta(E)=\beta'(E') & & (\text{say} ) \\ \end{align}\] Hence we say that the system is in thermal equilibrium when this \(\beta(E)\) equals \(\beta'(E')\). We can define a convenient quantity \(T\) such that, \(kT=1 /\beta\), then \(\dfrac{1}{T}=k\beta=\dfrac{ \partial (k\ln\Omega) }{ \partial E }=\dfrac{ \partial S }{ \partial E }\). The peak of the probability curve \(\mathbf{P}(E)\) is very sharp and decreases rapidly for values far from \(\tilde{E}\) where the \(\mathbf{P}(\tilde{E})\) is maximum. Th. choice of the temperature variable is really a matter of choice (different \(k\)) and is separate from the classical idea. Thermodynamically the temperature variable can be any external variable that shows enough change in value when the system is in contact with another system not already in thermal equilibrium with the test system. It can be anything from length, volume, pressure, etc.