# On the Conservation of Snow Leopards

You are working with a nature documentary crew looking for snow leopards. Four days in, the guides lead you to the top of a mountain pass dividing two valleys. Snow leopards are sometimes sighted using the pass to move between the valleys.

You carefully place your camera traps and return to camp. Weeks pass with no sightings. You place more camera traps on the other passes and wait. Days come and go but no leopards. Frustrated, you seek out the team’s conservationist. “How many leopards do you think are in this valley?”

“Two,” she says, “on average.”

“Not many for all this space,” you answer, looking out across the rugged miles that surround your mountain camp. You slip into the main tent to check the tapes. You are scrolling through the video feed when, at last, a snow leopard appears on the screen. It crossed the pass last night. It even paused to inspect the camera before moving on. Thrilled, you wake the team and replay the tape. Everyone celebrates. It was a long month waiting.

The conservationist is watching over your shoulder. You look up. “I guess there’s only one in the valley left to find.”

She shrugs. “I still think there’s two out here.”

“But we saw one leave. How can there still be two?”

She shrugs again and smiles. “Two,” she says, “on average.”

## Introduction

This is an example of a problem in expectation. The actual number of snow leopards in the valley is unknown. But, on average, there are two leopards in the valley. The question is how this expected number should update after seeing a snow leopard leave the valley.

At first the answer seems obvious. Every time a leopard leaves the valley the total number of leopards in the valley decreases by one. So the expectation should decrease by one.

But should it?

Suppose that the expected number of leopards in the valley is $0.5$ instead of $2$. When one leopard leaves the valley, the expected number of leopards remaining in the valley cannot be $0.5 - 1 = -0.5$—there is no such thing as an anti-leopard. In this case it is obvious that we cannot update the expectation by subtracting one leopard from the expectation.

The subtlety here is in the distinction between expecting that there are a certain number of leopards in the valley and knowing the number. If we had counted all the leopards then we would know the number exactly. This number would match the expectation and would decrease by one when a leopard leaves. If we do not know the number of leopards for sure then the expectation is an average. Knowing the average number is not the same as knowing the actual number of leopards because it leaves the actual number uncertain. This uncertainty means that there is implicitly a probability distribution on the size of the leopard population.

For example, if the expected number of leopards is $0.5$ there is not actually half a leopard. There is no more a fractional leopard than an anti-leopard. It means that there is a nonzero probability that there are zero leopards, and a nonzero probability that there are one or more leopards.

Now we see a leopard leave. This would have been impossible if there were zero leopards in the valley, since leopards cannot appear from thin air. So there was at least one leopard in the valley. The observation event has taught us something: given that we a saw a leopard leave we should ignore the possibility that there had been zero leopards. As a result, the conditional expectation of the number of leopards in the valley before one left should always be greater than one. In turn the new expectation can never be negative.

Speaking more broadly, the more leopards in the valley the more likely it is to observe one leaving. Therefore, the observation event carries information about the number of leopards in the valley before the event. The fact we saw one leave means there could not have been zero—there may have been more than we thought. This means we should revise our old expectation upwards before subtracting the individual who left. The more uncertain we were before seeing a leopard leave the more we should revise upwards.

Revising the expectation upward before subtracting the wandering leopard is common sense in other contexts. For example, consider a fisherman who tries a new pond and has the most successful day of his fishing career. The fisherman is likely to return to the pond; since he caught fish there means there are (or at least were) fish in the pond. As long as he continues to catch fish, he is likely to return. Every catch is evidence that the pond contains fish, despite depleting the pond.

Here we formalize this problem to show how the expectation should be updated after observing a leopard leave. This requires introducing some notation and formalizing the relationship between the rate at which leopards leave the valley and the number of leopards in the valley.

## Model

Let $X(t)$ be the number of snow leopards in the valley at time $t$. This is an example of a stochastic process since $X(t)$ is a random variable for each $t$. Let $p(x,t)$ be the probability $X(t) = x$.

Then the expected number of leopards at time $t$ is:

$\bar{x}(t) = \mathbb{E}[X(t)] = \sum_{x=0}^{\infty} x p(x,t).$

Now we need a model for when snow leopards cross the pass.

Since snow leopards do not wear wrist-watches, and since all our efforts to speak to snow leopards have failed, the timing of each crossing is unpredictable so is best modeled as a random variable. This is an example of a counting process. Each time a leopard crosses the pass we count an additional crossing, but the timing of the crossings is random. In order to model the crossing we need to specify a probability distribution that returns a probability (or probability density) for any sequence of crossing times.

A natural way to construct such a distribution is to define an expected, or average, event rate, $\lambda(x)$. If we let $N([t,t+\Delta t])$ be the number of events that occur in the interval $[t,t+\Delta t]$, then this usually means that:

$\text{Pr}\{N([t,t+\Delta t]) = n\} = \begin{cases} 1 - \lambda(X(t)) \Delta t + \mathcal{o}(\Delta t) & \text{ if } n = 0 \\ \lambda(X(t)) \Delta t + \mathcal{o}(\Delta t) & \text{ if } n = 1\\ \mathcal{o}(\Delta t) & \text{ if } n \geq 2 \end{cases}.$

Here $\mathcal{o}(\Delta t)$ represents any function of $\Delta t$ that converges to zero as $\Delta t$ goes to zero faster than $\Delta t$. A function $f(\Delta t)$ is $\mathcal{o}(\Delta t)$ if $\lim_{\Delta t \rightarrow 0} f(\Delta t)/\Delta t = 0$. Under these assumptions the expected number of events in any time interval is simply $\lambda$ times the length of the time interval. This sort of counting process, a Poisson process, is widely used to model rare events. For example, this is the precise probabilistic description for the decay of radioactive material.

What remains is to specify $\lambda(x)$, the rate at which we expect to see leopards leave the valley if there are $x$ leopards in the valley. It is reasonable to assume that this rate increases the more snow leopards there are in the valley. Note that the actual dependence of $\lambda$ on $x$ depends on how leopards interact while dispersing. A highly social animal is likely to stay near other members of its species, so the rate at which any individual leaves a group may decrease the more individuals are in the group. In this case $\lambda(x)$ will be sublinear (not proportional to $x$), and may even decrease in $x$ for large enough $x$. Territorial animals may actively avoid each other while dispersing, hence $\lambda(x)$ may be superlinear in $x$. In general it is only possible to address the question, “How many do you think are out there now?,” once $\lambda(x)$ is specified. Here we give the solution for a particular $\lambda(x)$, and provide details on the general case as supplement.

Under most migration models it is reasonable to assume that the rate $\lambda(x)$ is proportional to $x$. This is often assumed for one of two reasons:

1. Linear models are easy to treat analytically and often give sufficiently good approximations when $x$ does not vary greatly.
2. Linear models match physical systems in which individuals disperse independently of one another.

Both of the reasons are in play here. It is always better to start with a tractable model in order to understand the fundamental components of a problem. Moreover the ubiquity of linear transition rates in applications makes linear rates an important test case. Finally, snow leopards are famously solitary animals—“the only prolonged social contact in snow leopards is that of a female and her dependent offspring . . . no evidence was found to substantiate territoriality”2—so it is not unreasonable to start by modeling their dispersal as independent.

If each individual disperses independently of the other individuals, then:

$\lambda(x) = \alpha x$

for some per capita rate $\alpha$. The per capita (per individual) rate $\alpha$ is simply the rate at which any individual is expected to leave the valley.

.

Demo 1: Each point represents a snow leopard and the rectangle the valley. They move randomly and when they leave the valley, another leopard is placed randomly in the valley. The graph shows the number of leopards that have left. Notice that, while the leopards leave at random times, the average rate at which they leave is proportional to their frequency. Test this by changing the number of leopards in the valley (move the slider).

We are now equipped to state the question formally.

## Problem

Suppose that the transition rate is linear in the number of leopards in the valley, the expected number of leopards in the valley before observing one leave is $\bar{x}$, and one is observed leaving. What is the new expectation?

## Solution

If the expected number of leopards in the valley before the event was $\bar{x}$, and the variance in the number of leopards in the valley before the event is $v$, then the expected number after the event is $\bar{x} - 1 + v/\bar{x}$.

## Proof

The proof is organized as follows. First we show that if a sufficiently small time window is chosen around the event time, then it can be assumed that only one event occurs during the time window. We then use the asymptotic form for the probability that one event occurred during the window to compute the conditional probability that there were $x$ leopards in the valley. We then average over this distribution to compute the expected number of leopards in the valley after one is seen leaving.

Suppose that an event occurs at time $t$. Consider the time interval $[t - \Delta t, t + \Delta t]$ for small $\Delta t$. By assumption, the probability that more than one event occurs in the interval is $\mathcal{o}(\Delta t)$. We condition on at least one such event occurring. The probability of at least one event occurring is proportional to $\Delta t$. This means that the probability of more than one event occurring, conditioned on an event occurring, is proportional to $\mathcal{o}(\Delta t)/(\Delta t)$, which, by definition, converges to zero as $\Delta t$ goes to zero. Therefore, for sufficiently small $\Delta t$ we can assume that only one transition event occurred in the time interval.

Since one event occurred during the interval, $p(x,t+\Delta t) = \text{Pr}\{X(t + \Delta t) = x\}$ is the conditional probability:

$p(x,t+\Delta t) = \text{Pr}\{X(t - \Delta t) = x + 1| N([t-\Delta t, t + \Delta t]) = 1 \}.$

Or, equivalently:

$p(x-1,t+\Delta t) = \text{Pr}\{X(t - \Delta t) = x| N([t-\Delta t, t + \Delta t]) = 1\}.$

That is, the probability there are $X = x - 1$ leopards in the valley after seeing the event is the probability that there were $x$ leopards in the valley before seeing the event, given that an event occurred. To compute this conditional probability we will use Bayes’ rule.

Now, using the fact that $\text{Pr}\{B|A\} = \text{Pr}\{B\cap A\}/\text{Pr}\{A\}$:

$\text{Pr}\{X(t -\Delta t) = x | N([t-\Delta t, t + \Delta t]) = 1 \} = \frac{\text{Pr}\{X(t-\Delta t) = x \cap N([t-\Delta t, t + \Delta t]) = 1\}}{\text{Pr}\{N([t-\Delta t, t + \Delta t]) = 1 \}}.$

To compute the joint probability that there were $x$ leopards in the valley and one left, we use $\text{Pr}\{B \cap A \} = \text{Pr}\{A|B\} \text{Pr}\{B\}$.

The probability $X(t - \Delta t) = x$ is $p(x,t - \Delta t)$ by definition. If $X(t -\Delta t) = x$ then the probability one event occurred in the interval $[t - \Delta t, t + \Delta t]$ is $2 \alpha x \Delta t + \mathcal{o}(\Delta t)$. Therefore:

$\text{Pr}\{ X(t-\Delta t) = x \cap N([t-\Delta t, t + \Delta t]) = 1 \} = (2 \alpha x \Delta t) p(x,t-\Delta t).$

This joint probability is the numerator in the conditional probability we are solving for.

For the denominator we need the probability that one leopard left. To do this, sum the expression given above over all possible $x$:

$\text{Pr}\{N([t-\Delta t, t + \Delta t]) = 1 \} = \sum_{x = 0}^{\infty} (2 \alpha x \Delta t) p(x,t-\Delta t) .$

Then, substituting the numerator and denominator in and simplifying:

\begin{aligned} p(x-1,t+\Delta t) & = \text{Pr}\{X(t -\Delta t) = x | N([t-\Delta t, t + \Delta t]) = 1 \} \\ & = \frac{(2 \alpha x \Delta t)}{\sum_{y = 0}^{\infty} (2 \alpha y \Delta t)p(y,t - \Delta t)} p(x,t - \Delta t) \\ & = \frac{x}{\sum_{y = 0}^{\infty} y p(y,t - \Delta t)} p(x,t - \Delta t) \\ & = \frac{x}{\bar{x}(t - \Delta t)} p(x,t - \Delta t) . \end{aligned}

To finish, take the limit as $\Delta t$ goes to zero:

$p(x-1,t+dt) = \frac{x}{\bar{x}(t-dt)} p(x,t-dt).$

This is the probability that there are $x-1$ leopards in the valley given that one left the valley at time $t$. Here $dt$ represents an infinitesimally small time step and is retained to distinguish times immediately preceding and immediately following the transition event.

A convenient way to think about this equation is that $x p(x,t - dt)$ is the rate at which probability flows out of the state: [there were $x$ leopards at time $t - dt$], and into the state: [there are now $x - 1$ leopards at time $t + dt$]. The rate of a probability flow is a probability flux, $j$. The product $x p(x,t-dt)$ is the probability flux $j(x,t-dt)$. Therefore, $p(x-1,t+dt)$ is proportional to the distribution of probability fluxes $j(x,t-dt)$, normalized by $\bar{x}(t-dt)$ and shifted down by one. The animation below shows the initial distribution of leopards $p(x,t-dt)$ transforming into the probability fluxes $j(x,t-dt) = x p(x,t-dt)$ and then scaling and shifting to recover the distribution after the leopard left, $p(x,t + dt)$.

Figure 1: The distribution before the event is shown in red. It is then transformed into the probability fluxes. The probability flux leaving the state representing ten snow leopards is the probability there were ten leopards in the valley times the rate at which leopards would leave the valley if there were ten in the valley. This is shown in purple. The distribution of fluxes is then normalized, and shifted down by one to account for the leopard leaving. This gives the conditional distribution for the number of leopards in the valley given that an event occurred. This distribution is shown in blue. The red distribution is the distribution of leopards before the event. The blue is the distribution afterwards.

Now that we have the probability $X(t+dt) = x - 1$ given that an event occurred at time $t$ we can compute the new expectation:

$\bar{x}(t+dt) = \mathbb{E}[X(t+dt)| \text{an event occurred at time } t] = \sum_{x = 0}^{\infty} x p(x, t + dt).$

Substituting in for $p(x,t+dt)$ in terms of the old distribution:

$\bar{x}(t+dt) = \sum_{x = 0}^{\infty} x \frac{(x+1)}{\bar{x}(t-dt)} p(x+1,t-dt).$

Let $y = x+1$. Then:

\begin{aligned} \bar{x}(t+dt) & = \sum_{y = 1}^{\infty} (y-1) \frac{y}{\bar{x}(t-dt)} p(y,t-dt) \\ & = \frac{1}{\bar{x}(t-dt)}\sum_{y = 0}^{\infty} y (y-1) p(y,t-dt) \\ & = \frac{1}{\bar{x}(t-dt)} \mathbb{E}[X(t-dt)(X(t-dt)-1)] \\ & = \frac{1}{\bar{x}(t-dt)}(\mathbb{E}[X(t-dt)^2] - \bar{x}(t-dt)) \\ & = \frac{\mathbb{E}[X(t-dt)^2]}{\bar{x}(t-dt)} - 1. \end{aligned}

To simplify the equation note that the expected value of a random variable squared is the same as the variance in the random variable plus the expected value of the random variable squared.

Let $v(t)$ denote the variance in $X(t)$. Then $\frac{\mathbb{E}[X(t-dt)^2]}{\bar{x}(t-dt)} = \frac{v(t-dt) + \bar{x}(t-dt)^2}{\bar{x}(t-dt)} = \frac{v(t-dt)}{\bar{x}(t-dt)} + \bar{x}(t-dt)$. Therefore:

$\bar{x}(t+dt) = \bar{x}(t-dt) - 1 + \frac{v(t-dt)}{\bar{x}(t-dt)}. \quad \blacksquare$

## Discussion

This equation is easy to interpret. The new expectation is the old expectation minus one leopard, since we saw a leopard leave, plus our uncertainty in the number of leopards. We add the uncertainty because seeing a leopard leave is evidence that there may have been more leopards in the valley than we’d thought. Notice that if we had no uncertainty, then we knew the number of leopards in the valley, so the new expectation is just the old expectation minus one.

Here uncertainty is measured in the variance divided by the mean. This is the coefficient of variation (CV). The CV is a natural measure of uncertainty in this context, since it is a measure of the uncertainty relative to the mean. If we think leopards are rare, then the mean is small, and if we are very uncertain about the number of leopards, then the variance is large. This is precisely the case when observing a leopard should change our expectation the most. Accordingly the CV is largest when we expect leopards to be rare, but we are very uncertain about the number of leopards. This occurs when the distribution $p(x,t)$ is skewed positive.

So who was right? Our fictional (idealized wilderness) self or the conservationist?

It depends on the coefficient of variation. If the conservationist knew the CV then she could answer exactly. The CV could be known empirically (by studying the population of leopards in many valleys), or could be computed if it is assumed that leopards are distributed according to a one-parameter family of distributions.

Sticking to our modeling approach, let’s see what happens if we pick a distribution. The natural first choice is a Poisson distribution, since many rare items are Poisson-distributed.

Figure 2: Poisson distributions with means 0.5, 4, and 8 (blue, purple, and red respectively). Note that the larger the mean the larger the variance in the distribution.

Suppose that the leopards are Poisson-distributed at time $t$ with mean $\bar{x}(t)$. This means that:

$p(x,t) = \text{Pr}\{X(t) = x \} = \frac{\bar{x}^x}{x!} \exp(- \bar{x}).$

Remarkably, the Poisson distribution has variance equal to its mean.

Since the coefficient of variation is the variance divided by the mean, the CV of the Poisson distribution equals $1$. But then:

$\bar{x}(t + dt) = \bar{x}(t-dt) - 1 + \frac{\bar{x}(t-dt)}{\bar{x}(t-dt)} = \bar{x}(t-dt).$

The expected number of leopards after observing one leave is the same as the expected number before seeing one leave!

Even more provocatively, no matter how many times we see a leopard leave, our expectation does not change. That is, Poisson-distributed leopards are conserved in expectation.

We can go further. Not only the expectation stays the same. If $p(x,t-dt)$ is a Poisson distribution then $p(x,t+dt)$ is also Poisson and $p(x,t-dt) = p(x,t+dt)$. In this case not only is the expectation conserved, the entire distribution is conserved! Hence the observation event carries no information about the number of leopards in the valley. This is illustrated by the animation below.

Figure 3: The distribution before the event is shown in red. Here it is assumed to be Poisson. It is then transformed into the probability fluxes. The probability fluxes are shown in purple. The distribution of fluxes is then normalized, and shifted down by one to account for the leopard leaving. This gives the conditional distribution for the number of leopards in the valley given that an event occurred. This distribution is shown in blue. The red distribution is the distribution of leopards before the event. The blue is the distribution afterwards. The blue distribution is the same as the red distribution, so the distribution is conserved when conditioning on the observation of a leopard leaving.

In this case the conservationist has made the most consistent prediction. The expected number of leopards should stay the same even though one was observed leaving.

What about the expected number of leopards in the neighboring valley? If the expected number in our valley stayed the same surely the expected number in the neighboring valley does as well?

This is not true. The expected number of leopards in the neighboring valley increases by one, exactly as we might have thought before doing all this math. How is that possible? The expected number in the neighboring valley increases by one, since the rate at which leopards enter a valley is independent of the number of leopards in that valley. It follows that seeing a leopard enter a valley tells us nothing about the number in the valley before the observation event. So all we need to do is add the new leopard to our previous expectation.

This is the key idea. When modifying an expectation to account for an observed event we need to ask: does the observation convey information about our original expectation? If it does, then we modify our old expectation before subtracting or adding the number of leopards entering or leaving. If it doesn’t, then there is no need to revise our expectation.

So, while the expected number of leopards in our valley is conserved the total number of expected leopards is not. Instead it increases by one every time we see a leopard walk out of our valley. An expected leopard has, in fact, entered the pair of valleys directly from the probabilistic ether!

The sudden appearance of a new expected leopard seems strange, since it violates our intuition about the conservation of expectation. A leopard leaving a valley does not change the total number of leopards in the two valleys, but, with these assumptions, the expected number of leopards increases every time we see a leopard move between valleys. Taken in isolation this would mean that seeing the same leopard walk back and forth between the valleys would make our expected total number increase and increase and increase. That is obviously wrong.

The natural balance to this effect is that not seeing leopards is evidence that we should decrease our expectation. After all, if our hypothetical film crew waited a year, then they would conclude that leopards are rare, and if they had to wait a decade, then they might conclude that leopards are (at least locally) extinct. In general, the expected number of leopards should decay continuously in between observation events. Using the same modeling framework it is possible to show that this is, in fact, the case. Moreover, the rate at which the expectation decays between events balances the increase in expectation after each observation event.

Let’s put it all together. While we are waiting to see a leopard our expectation decays slowly. When we finally see one leave our valley we keep the expected number in our valley the same, but add a leopard to the expected number in the neighboring valley. On the other hand, if we see one enter our valley then we keep the expected number in the neighboring valley the same, and increase the expected number in our valley by one. Then we wait again. On the whole the process will keep our expectation near the true number of leopards. This is illustrated in the animation below.

Figure 4: Simulation of Leopards moving between two valleys. The thin lines represent the actual number of leopards in the valleys, and the thick lines represent the expectation per valley. The simulation starts with three leopards in the first valley and nine in the second. The first valley is smaller than the second so the rate of transition out of the first valley is faster than the rate of transition out of the second valley (per capita rates 1 and 1/3 respectively). The expectations start at 4 and 5 leopards. Notice that this is the wrong total number of leopards. As the simulation progresses, the observed transition events inform the expectations. This corrects the error in the total expected number of leopards which approaches 12. After the initial first few events the expectations start to track the actual number of leopards. Also notice that the expectation in a given valley only jump when a leopard enters, not when one leaves.

## In all seriousness

This problem was motivated by a study of chemical signaling at the cellular scale. Cells signal each other by releasing signaling molecules, which diffuse through the inter-cellular medium and bind to receptors on other cells. The rate at which the receptors bind to the signaling molecule is proportional to the number of signaling molecules. The receptors play the same role as the camera traps in the previous examples. The signal released by the transmitting cell is encoded in the number of signaling molecules. The receiving cell receives this signal indirectly through observation events (i.e., binding events at receptors). How well could a receiving cell estimate the number of signaling molecules in solution based on occasional observations of binding events?

The example provided here shows that when the rate of observable events depends on the state of a hidden variable, observing an event carries information about the hidden variable, which should influence our expectation about the hidden variable.

## More on Filming Leopards

Planet Earth II Documentary on Filming Leopards

1. Billingsley, Patrick. Probability and measure. John Wiley & Sons, 2008. pp. 297-307.

2. Jackson, Rodney Malcolm. Home range, movements and habitat use of snow leopard(Uncia uncia) in Nepal. PhD diss., University of London, 1996. pp. 135 - 136

3. Anderson, David F., and Thomas G. Kurtz. Stochastic analysis of biochemical systems. Vol. 1. Berlin: Springer, 2015. pp. 36

© 2021, Built by Alex and Jack Strang