On pooled COVID-19 testing

by Ceki Gülcü, licensed under Creatative Commons
created 2020-03-19, last updated on 2020-03-19

Pooled testing can significantly increase COVID-19 testing capacity using existing resources. The core idea is to mix the samples taken from multiple individuals in the same test. I first heard of the idea in a Times Of Israel article.

The idea should be immediately recognizable to a computer scientist or a software engineer. Indeed, many abstract data structures use similar techniques. Bloom filters come to mind, as well as cumulative acknowledgments in TCP.

The idea of testing 64 people at a time while bearing the cost of only a single laboratory test is quite appealing.

Two questions come to mind. The first and most critical question is whether a single lab test is sensitive enough to detect the virus when samples of many individuals are mixed together. According to the Technion and Rambam Health Care Campus the answer seems to be affirmative.

The second question is estimating the correct pool size. If the pool size is big, then at least one positive result will cause the whole batch to be discarded. Knowing that someone in a pool of 1000 people is infected does not help much. We would to size the pool such that many of the tested pools come back negative and large number of people can be labeled as free of COVID-19.

If K is the pool size and D is the frequency at which we must discard pools, then K and D should be numerically close. Here is the intuitive explanation:

D larger than K

For the sake of example, assume pool size K equal to 16 and D equal 60, then every 60th batch would need to be re-tested individually. However, only 16 people are in the batch. It makes sense to increase the size of K so as to test as many people as possible in the same pool.

K larger than D

In case K larger than D, the larger size of K yields worse and worse returns as K increases. As the pool size increases, more and more pools need to be discarded, wasting resources.

What about actual probabilities?

Let C denote that all samples in a pool are clean., that is the probability that no one in the pool is infected and the entire pool tests negative. C is given by the expression:

C = (1 - r)^K

where ^ denotes the power function and r denotes the proportion of contaminated individuals in a population.

The probability that the pool needs to be discarded is given by the expression:

1 - C

It follows that the frequency D at which we discard pools is given by the expression:

D = 1 / (1 - C)

A separately available spreadsheet lists D for various contamination rates and pool sizes.

From this spread sheet and given current contamination rates, it appears that USA and Israel could use pools of size of 64, South Korea and France pools of size 32, Italy and Switzerland pools of size 16.

Actual efficiency

For pool size K, if all samples are clean, we have cleared K individuals. If all samples are not clean, then we need to clear each individual one by one.

Let O denote the expected number of tests required to test K people. O is is given by:

O = P(all clean) * 1 + P(not all clean)*(1+K)

However, P(all clean) is just C mentioned above. Thus, to test K individuals, the required number of operations is:

O = C + (1 - C) * (1+K)

expanding the above expression yields

O = 1 + K (1 - C)

The overall efficiency of the method compared to naive individual testing is K/O.

The sheet entitled "expected efficiency" in the aforementioned spreadsheet lists the expected efficiency impovements. The results show that for infection ratios of 1 in 400 or less, at least 10 fold improvements in testing capacity can be expected.


Pool sizes need to decrease as the contamination rate in a population increases. Moreover, for symptomatic individuals with high priori infection probabilities, pooling is unsuitable.

WARNING: It goes without saying that when taking samples, swabs must be unique to each individual so as not to further disseminate the virus.