Health Observer Systems: A Comparative Review

I looked for every health data system that shares features with my Medical Snapshot proposal , something I’ve been working on since 2007. Such programmes medically measure people over time, store what they find, and don’t necessarily tell the individuals what their own data says. This article is what I learned about 26 systems, spanning nine decades from the Tuskegee Syphilis Study (1932) to Our Future Health (2022).

What I Measured and Why

This analysis compares existing systems against a hypothetical system, in a way that is exploratory and hand-waving rather than solid science.

The 18 features in the scoring matrix are derived from the Medical Snapshot’s design. That means the matrix measures how close each system comes to the Snapshot’s architecture rather than some kind of quality metric. A low-scoring system may be excellent at what it set out to do. As an example, UK Biobank only scores 7 out of 18 while being the most successful voluntary biobank in the world, so clearly this is about evaluating new ideas not assessing quality of existing solutions.

This matrix asks a narrow question: which of the Snapshot’s features has each existing system implemented? A different question should also be asked: how we might compare these systems with each other? I’m preparing a paper that asks this research question, among others.

Headline results

Scatter plot of 26 health data systems: research mechanics vs core participant ethics. The upper-right quadrant is empty except for the Medical Snapshot. — Plot 1: Research mechanics (x) vs core participant ethics (y, max 6). Research biobanks cluster at moderate mechanics and low ethics regardless of country or scale. It is interesting that the upper-right quadrant is empty.

Scatter plot of 26 health data systems: research mechanics vs full governance ethics. The bracket on the right marks nine features no operational system has combined with high mechanics. — Plot 2: Research mechanics (x) vs full governance ethics (y, max 11). The bracket marks nine governance features that no operational system has combined with high mechanical capability. I have no experimental evidence that these features are required, but some of them are discussed in the literature.

How the measuring works

The 18 features fall into six dimensions, ordered logically like this:

flowchart TD A["A: Data Collection
What is gathered?"] --> B["B: Information Embargo
Who sees it?"] B --> C["C: Retroactive Release
When does it come back?"] C --> D["D: Directionality
Who holds power?"] D --> E["E: Participation Ethics
What's the deal?"] E --> F["F: Governance
Who enforces the rules?"]

A: Data Collection (4 features) — does the system test broadly across many conditions, test healthy people, test the same people repeatedly over time, and retain biological samples? These are the operating mechanics.

B: Information Embargo (3 features) — does the system withhold results from participants, operate independently from their clinical care, and keep the specific tests conducted unknown to participants?

C: Retroactive Release (3 features) — does a formal pathway exist for releasing stored data when something happens later? Is it triggered by the treating doctor’s diagnosis rather than by researchers? Does the released data primarily benefit the individual?

D: Directionality (3 features) — can the participant or their doctor initiate release? Is state access structurally excluded? Did the participant choose to join?

E: Participation Ethics (2 features) — are participants paid? Does commercial early-access funding support the operation?

F: Governance (2 features + 1) — is eventual public data release mandatory? Do trustees have a legal duty to participants rather than to funders or governments? Is data stored across multiple legal jurisdictions?

A score of 1 means the system shares that feature with the Snapshot. A score of 0 means it doesn’t. A score of −1* means the system does the opposite of what the Snapshot intends — compulsory participation where the Snapshot requires voluntary, or state control where the Snapshot requires exclusion. Inversions count as 0 in the total but are marked separately because they carry information.

What I searched, and the biases I found

The English-language literature on biobank ethics is large and well-indexed. The comparison started there: Framingham, UK Biobank, deCODE, FinnGen, the standard landscape.

If you only search in English, you find what English-speaking researchers have written about. We expect bias, but in these cases the biases were so significant I needed an additional strategy. Therefore I made subsequent searches in Chinese, French, Spanish, and Russian, using native-language framing rather than translated English queries.

As an example: translating (via both machine and obliging native speaking scientist) the phrase “biobank observer study no feedback ethics” into Chinese produces results about what Chinese scholars have said about English-framed questions. That is interesting but completely unhelpful here.

However searching for 健康医疗数据相关研究的伦理审查 (“ethics review of health data research”) produces what Chinese scholars are actually asking in their own terms, and returns a wealth of legal and ethical discussion.

In a similar vein, the French searches located something almost invisible in English literature, which is the INSERM VolREthics charter and a two-volume IGAS report on cohort studies, the most developed framework I found anywhere for healthy volunteer ethics. The Spanish searches found the Gaceta Sanitaria literature on big-data health research ethics.

The Russian searches found primarily regulatory material — compliance-focused rather than critically engaged — which is itself a finding.

Source Credibility

Not all sources are equal, and some widely-cited ones are unreliable.

Sources are grouped into three tiers. Tier A is government primary documents — legislation, official statistics, institutional reports. Tier B is independent organisations with documented methodology — Citizen Lab, Human Rights Watch (with named limitations on interview-based claims). Tier C is advocacy-adjacent material where institutional funding creates structural conflicts of interest.

As a special case, The Australian Strategic Policy Institute (ASPI) is excluded at all tiers, which matters because ASPI’s 2020 genomic surveillance report is widely cited in the English literature regarding Chinese DNA collection. ASPI is funded by the Australian Department of Defence, the US State Department, NATO, and weapons manufacturers including BAE Systems, Lockheed Martin, and Raytheon. ASPI’s China-focused outputs consistently align with those funders’ interests, and the genomic report was part-funded by US government strategic promotional organisation. The Australian government found ASPI had published “op-ed overreach” and “partisan commentary.” I have cited claims made in ASPI reports where they appear to be supported independently by organisations such as Human Rights Watch, Citizen Lab, or Chinese government sources. There’s a lot more work to be done to get a fair overview of this. I have tried to make a reasonable first pass.

The Scoring

Systems are ordered by total score descending, then by year of establishment.

System	A1	A2	A3	A4	B5	B6	B7	C8	C9	C10	D11	D12	D13	E14	E15	F16	F17	F18	Total
Medical Snapshot (reference)	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	18
DoD Serum Repository (USA, 1985)	0	1	1	1	1	1	1	1	0	1	0	0	0	0	0	0	0	0	8
EPIC (Europe, 1992)	1	1	1	1	1	½	0	0	0	0	0	0	1	0	0	1	0	0	7.5
China “Physicals for All” (2013–)	1	1	1	1	1	1	1	0	0	−1*[C]	−1*[C]	−1*	−1*[C]	0	0	0	−1*	0	7*
China Kadoorie Biobank (2004)	1	1	1	1	1	0	0	0	0	0	0	0	1	0	0	1	0	0	7
Taizhou (China, 2009)	1	1	1	1	1	0	0	0	0	0	0	0	1	0	0	1	0	0	7
UK Biobank (2006)	1	1	1	1	1	0	0	0	0	0	0	0	1	0	0	1	0	0	7
Framingham (USA, 1948)	1	1	1	1	0	0	0	0	0	0	0	0	1	0	0	1	0	0	6
Whitehall I (UK, 1967)	1	1	0	1	1	0	0	0	0	0	0	0	1	0	0	1	0	0	6
Nurses’ Health Study (USA, 1976)	1	1	1	1	0	0	0	0	0	0	0	0	1	0	0	1	0	0	6
ALSPAC (UK, 1991)	1	1	1	1	0	0	0	0	0	0	0	0	1	0	0	1	0	0	6
BioBank Japan (2003)	1	0	1	1	1	0	0	0	0	0	0	0	1	0	0	1	0	0	6
All of Us (USA, 2018)	1	1	1	1	−1*	0	0	0	0	0	0	0	1	0	0	1	0	0	6
CNHBM (China, 2017)	1	1	1	1	1	0	0	0	0	0	0	0	1	0	0	0	0	0	6
Guthrie Card (global, 1963)	0	1	0	1	½	1	0	1	0	1	0	0	0	0	0	0	0	0	5.5
Generation Scotland (2006)	1	1	0	1	½	0	0	0	0	0	0	0	1	0	0	1	0	0	5.5
deCODE (Iceland, 1998)	1	1	0	1	1	0	0	0	0	0	0	0	−1*	0	1	0	0	0	5
Estonian Biobank (2001)	1	1	0	1	0	0	0	0	0	0	0	0	1	0	0	1	0	0	5
FinnGen (Finland, 2017)	1	1	0	1	0	0	0	0	0	0	0	0	1	0	1	0	0	0	5
Our Future Health (UK, 2022)	1	1	0	1	−1*	0	0	0	0	0	0	0	1	0	0	0	0	0	4
China CNGB / BGI GeneBank (2016)	0	1	0	1	1	1	0	0	0	0	0	−1*	0	0	0	0	0	0	4
NDNAD / CODIS (UK/USA, 1995/1998)	0	−1*	0	1	1	1	0	0	0	0	0	−1*	−1*	0	0	0	0	0	3
China MPS DNA database (2003)	0	−1*	0	1	1	1	0	0	0	0	0	−1*	−1*	0	0	0	−1*	0	3
Majengo Cohort (Kenya, 1985)	0	0	1	1	½	0	0	0	0	0	0	0	½	0	0	0	0	0	2.5
Tuskegee (USA, 1932)	0	−1*	1	0	−1*	−1*	1	0	0	−1*	−1*	−1*	−1*	0	0	0	−1*	0	2
NZ Unfortunate Experiment (1966)	0	−1*	1	0	−1*	−1*	0	0	0	−1*	−1*	−1*	−1*	0	0	0	−1*	0	1

Notation: 1 = feature present. 0 = absent. ½ = partial. −1* = inverted (system does the opposite); counts as 0 in total. −1*[C] = inverted but evidence rests on advocacy-adjacent inference only. ASPI excluded at all tiers.

What the Scores and Plots Show

The pattern is consistent across both plots. Research biobanks cluster at mechanics 3–5.5 and ethics around 2, regardless of where they are or how large they are. Framingham (1948, USA), UK Biobank (2006, UK), and the China Kadoorie Biobank (2004, China) sit in the same cluster despite being separated by six decades and radically different governance contexts.

The DoD Serum Repository scores highest among operational systems (8/18). It predates the Snapshot proposal by 22 years and matches it on the information embargo, has a working retroactive release mechanism, and demonstrates individual clinical benefit. It fails on voluntariness (compulsory military), state exclusion, payment, commercial funding, and public release.

The columns where the landscape is emptiest are C (retroactive release) and D (directionality). No operational system scores more than 2 out of 5 on those columns combined. Every system scores 0 on C9 (clinically triggered release) and 0 on D11 (participant can initiate release) except the DoD Serum Repository, which gets one of those. This is where the Snapshot’s novelty lies.

The Western forensic databases (NDNAD, CODIS) score 3/18 — the same as the China MPS DNA database. They share the same pattern: biological banking, no individual feedback, clinical care independence, with inversions on population type, state control, and voluntariness. Scoring them identically is deliberate.

Where Assumptions Are Embedded

Dimensions A and B are largely tradition-neutral. Whether a system tests healthy people repeatedly, retains samples, and withholds results is a structural fact that any ethical framework can assess. The finding that the DoD Serum Repository is the highest-scoring operational system, and that “Physicals for All” demonstrates the mechanics at population scale, holds regardless of which tradition you evaluate from.

Dimensions D, E, and F embed assumptions.

D13 (voluntary consent) uses a binary: chose to join, or didn’t. In contexts where community obligation rather than individual choice governs participation norms — Confucian healthcare settings, some African community consent frameworks — this binary is too coarse. The [C] notation flags where this matters, but the notation handles evidential uncertainty, not theoretical plurality.

E14 (payment) is positive in the Snapshot’s design and in the French healthy-volunteer tradition. It is contested in the African anti-extractivism tradition, where payment above expense reimbursement may compromise the voluntariness of consent.

F17 (governance accountable to participants) embeds English trust law. The concept exists in some form in most legal systems, but the specific model the Snapshot assumes is not universal.

Other designs emerging from other traditions might replace payment with community benefit-sharing, or replace trustee governance with state oversight backed by strong individual rights provisions. The matrix scores these alternatives as 0 because they don’t match the Snapshot’s design — not because they are worse.

What I Measured and Why#

Headline results#

How the measuring works#

What I searched, and the biases I found#

Source Credibility#

The Scoring#

What the Scores and Plots Show#

Where Assumptions Are Embedded#

Further Reading#