created 2020-03-19, last updated on 2020-03-19

Pooled testing can significantly increase COVID-19 testing capacity using existing resources. The core idea is to mix the samples taken from multiple individuals in the same test. I first heard of the idea in a Times Of Israel article.

The idea should be immediately recognizable to a computer scientist or a software engineer. Indeed, many abstract data structures use similar techniques. Bloom filters come to mind, as well as cumulative acknowledgments in TCP.

The idea of testing 64 people at a time while bearing the cost of only a single laboratory test is quite appealing.

Two questions come to mind. The first and most critical question is whether a single lab test is sensitive enough to detect the virus when samples of many individuals are mixed together. According to the Technion and Rambam Health Care Campus the answer seems to be affirmative.

The second question is estimating the correct pool size. If the pool size is big, then at least one positive result will cause the whole batch to be discarded. Knowing that someone in a pool of 1000 people is infected does not help much. We would to size the pool such that many of the tested pools come back negative and large number of people can be labeled as free of COVID-19.

If *K* is the pool size and *D* is the
frequency at which we must discard pools, then *K* and
*D* should be numerically close. Here is the intuitive
explanation:

For the sake of example, assume pool size *K* equal to
16 and *D* equal 60, then every 60th batch would need to
be re-tested individually. However, only 16 people are in the
batch. It makes sense to increase the size of *K* so as
to test as many people as possible in the same pool.

In case *K* larger than *D*, the larger size of
*K* yields worse and worse returns as *K*
increases. As the pool size increases, more and more pools need
to be discarded, wasting resources.

Let *C* denote that all samples in a pool are clean.,
that is the probability that no one in the pool is infected and
the entire pool tests negative. *C* is given by the
expression:

C = (1 - *r*)^*K*

where ^ denotes the power function and *r* denotes the
proportion of contaminated individuals in a population.

The probability that the pool needs to be discarded is given by the expression:

1 - C

It follows that the frequency *D* at which we discard
pools is given by the expression:

D = 1 / (1 - C)

A separately
available spreadsheet lists *D* for various
contamination rates and pool sizes.

From this spread sheet and given current contamination rates, it appears that USA and Israel could use pools of size of 64, South Korea and France pools of size 32, Italy and Switzerland pools of size 16.

For pool size K, if all samples are clean, we have cleared K individuals. If all samples are not clean, then we need to clear each individual one by one.

Let *O* denote the expected number of tests required
to test *K* people. *O* is is given by:

O = P(all clean) * 1 + P(not all clean)*(1+*K*)

However, P(all clean) is just C mentioned above. Thus, to test K individuals, the required number of operations is:

*O* = *C* + (1 - *C*) * (1+*K*)

expanding the above expression yields

*O* = 1 + *K* (1 - *C*)

The overall efficiency of the pooled testing method compared
to naive individual testing is *K*/*O*.

The sheet entitled "expected efficiency" in the aforementioned spreadsheet lists the expected efficiency impovements. The results show that for infection ratios of 1 in 400 or less, at least 10 fold improvements in testing capacity can be expected.

Pool sizes need to decrease as the contamination rate in a population increases. Moreover, for symptomatic individuals with high à priori infection probabilities, pooling is unsuitable.

**WARNING:** It goes without saying that when taking
samples, swabs must be unique to each individual so as not to
further disseminate the virus.