Epidemiology Knowledge Graph
Simon Frost
Introduction
Epidemiology requires integrating heterogeneous data — pathogens, hosts, clinical observations, geographic surveillance, intervention records, and contact tracing data — into a coherent knowledge model. Semantic Spacetime is well-suited for this: its typed arrow system naturally expresses causal chains (LEADSTO), containment hierarchies (CONTAINS), property annotations (EXPRESS), and similarity relationships (NEAR) across domains — without requiring a separate ontology language.
This vignette builds a comprehensive infectious disease knowledge graph for a fictitious multi-pathogen respiratory outbreak scenario, and demonstrates SemanticSpacetime.jl’s capabilities:
- N4L knowledge encoding — pathogens, hosts, clinical features, contacts
- Typed arrow semantics — SST’s spacetime types map naturally to epidemiology
- Graph traversal — contact tracing via forward/backward cones
- Path finding — transmission chain reconstruction
- Centrality & orbit analysis — identifying super-spreader nodes
- Weighted search — risk propagation through the contact network
- Context intelligence — inferring outbreak clusters
- Text search — syndromic surveillance queries
- Visualization — outbreak network analysis
The scenario models an outbreak involving three pathogens (an influenza variant, a novel coronavirus, and an RSV strain) circulating simultaneously across hospital wards and community settings, reflecting the challenges of real-world syndromic surveillance.
using SemanticSpacetime
SemanticSpacetime.reset_arrows!()
SemanticSpacetime.reset_contexts!()
add_mandatory_arrows!()
config_dir = let d = joinpath(@__DIR__, "..", "..", "..", "SSTorytime", "SSTconfig")
isdir(d) ? d : nothing
end
if config_dir !== nothing
st = SemanticSpacetime.N4LState()
for cf in read_config_files(config_dir)
SemanticSpacetime.parse_config_file(cf; st=st)
end
end
store = MemoryStore()
println("SemanticSpacetime.jl epidemiology vignette")
println("Arrows loaded: config_dir=$(config_dir !== nothing ? "yes" : "no")")SemanticSpacetime.jl epidemiology vignette
Arrows loaded: config_dir=yes1. Building the Outbreak Ontology in SST
In SST, the ontology is implicit in the arrow types and chapters rather than declared via OWL. Each entity lives in a typed chapter, and relationships use SST’s spacetime arrow semantics:
| RDF / OWL Concept | SST Equivalent |
|---|---|
rdf:type / OWL Class | Chapter membership |
rdfs:subClassOf | CONTAINS arrows ((contain)) |
| Object properties (causal) | LEADSTO arrows ((leads to), (then)) |
| Object properties (structural) | CONTAINS arrows ((has-pt), (consist)) |
| Data properties | EXPRESS arrows ((note), (data), (name)) |
owl:sameAs / similarity | NEAR arrows ((ll), (syn)) |
1a. Pathogens
# Create pathogen nodes in a "pathogens" chapter
flu = mem_vertex!(store, "Influenza A H3N2", "pathogens")
cov = mem_vertex!(store, "SARS-CoV-2 JN.1", "pathogens")
rsv = mem_vertex!(store, "RSV subtype B", "pathogens")
# Pathogen taxonomy — CONTAINS captures "is a kind of"
virus = mem_vertex!(store, "Virus", "pathogen taxonomy")
resp = mem_vertex!(store, "Respiratory pathogen", "pathogen taxonomy")
for p in [flu, cov, rsv]
mem_edge!(store, virus, "contain", p)
mem_edge!(store, resp, "contain", p)
end
# Pathogen properties — EXPRESS arrows for metadata
flu_r0 = mem_vertex!(store, "R0 = 1.3", "pathogen properties")
cov_r0 = mem_vertex!(store, "R0 = 2.5", "pathogen properties")
rsv_r0 = mem_vertex!(store, "R0 = 3.0", "pathogen properties")
mem_edge!(store, flu, "note", flu_r0)
mem_edge!(store, cov, "note", cov_r0)
mem_edge!(store, rsv, "note", rsv_r0)
# Pathogen families
for (p, fam) in [(flu, "Orthomyxoviridae"), (cov, "Coronaviridae"), (rsv, "Pneumoviridae")]
fam_node = mem_vertex!(store, fam, "pathogen taxonomy")
mem_edge!(store, fam_node, "contain", p)
end
println("Pathogens: $(length([flu, cov, rsv])) pathogens created")Pathogens: 3 pathogens created1b. Locations
# Location hierarchy using CONTAINS
locations = Dict{String, Any}()
for (id, name, cls) in [
("royal_infirmary", "Royal Infirmary", "Hospital"),
("general_hospital", "General Hospital", "Hospital"),
("care_home_A", "Sunrise Care Home", "Care Home"),
("care_home_B", "Meadowbrook Care Home", "Care Home"),
("primary_school", "Oakfield Primary School", "School"),
("secondary_school", "Riverside Academy", "School"),
]
loc = mem_vertex!(store, name, "locations")
locations[id] = loc
# Type classification via containment
cls_node = mem_vertex!(store, cls, "location types")
mem_edge!(store, cls_node, "contain", loc)
end
println("Locations: $(length(locations)) created")Locations: 6 created1c. Symptoms and Risk Factors
symptom_nodes = Dict{String, Any}()
for s in ["fever", "cough", "dyspnoea", "fatigue", "myalgia",
"sore throat", "rhinorrhoea", "headache", "wheeze",
"hypoxia", "tachypnoea"]
symptom_nodes[s] = mem_vertex!(store, s, "symptoms")
end
risk_nodes = Dict{String, Any}()
for rf in ["age over 65", "immunocompromised", "chronic lung disease",
"diabetes", "cardiovascular disease", "obesity",
"pregnancy", "healthcare worker"]
risk_nodes[rf] = mem_vertex!(store, rf, "risk factors")
end
println("$(length(symptom_nodes)) symptoms, $(length(risk_nodes)) risk factors")11 symptoms, 8 risk factors1d. Patients (25 patients in the outbreak)
patient_data = [
# (id, age_group, risk_factors, vaccination_status)
("P001", "65+", ["age over 65", "chronic lung disease"], "vaccinated"),
("P002", "65+", ["age over 65", "diabetes"], "vaccinated"),
("P003", "40-64", ["healthcare worker"], "vaccinated"),
("P004", "18-39", String[], "unvaccinated"),
("P005", "65+", ["age over 65", "cardiovascular disease"], "vaccinated"),
("P006", "5-17", String[], "vaccinated"),
("P007", "5-17", String[], "unvaccinated"),
("P008", "18-39", ["pregnancy"], "vaccinated"),
("P009", "40-64", ["obesity"], "unvaccinated"),
("P010", "65+", ["age over 65", "immunocompromised"], "vaccinated"),
("P011", "0-4", String[], "unvaccinated"),
("P012", "40-64", ["healthcare worker"], "vaccinated"),
("P013", "18-39", String[], "vaccinated"),
("P014", "65+", ["age over 65", "chronic lung disease"], "unvaccinated"),
("P015", "5-17", String[], "vaccinated"),
("P016", "40-64", ["diabetes", "obesity"], "unvaccinated"),
("P017", "18-39", ["healthcare worker"], "vaccinated"),
("P018", "65+", ["age over 65"], "vaccinated"),
("P019", "0-4", String[], "unvaccinated"),
("P020", "40-64", String[], "vaccinated"),
("P021", "18-39", String[], "unvaccinated"),
("P022", "65+", ["age over 65", "immunocompromised"], "vaccinated"),
("P023", "5-17", String[], "vaccinated"),
("P024", "40-64", ["chronic lung disease"], "vaccinated"),
("P025", "18-39", ["healthcare worker"], "vaccinated"),
]
patient_nodes = Dict{String, Any}()
for (id, age_group, risks, vax) in patient_data
p = mem_vertex!(store, id, "patients")
patient_nodes[id] = p
# Age group — EXPRESS property
age_node = mem_vertex!(store, "age group: $age_group", "demographics")
mem_edge!(store, p, "note", age_node)
# Vaccination status
vax_node = mem_vertex!(store, "vaccination: $vax", "interventions")
mem_edge!(store, p, "note", vax_node)
# Risk factors — CONTAINS (patient has risk factor)
for rf in risks
mem_edge!(store, p, "has-pt", risk_nodes[rf])
end
end
println("$(length(patient_nodes)) patients created")25 patients created2. Surveillance Data — Case Reports
Each case report links a patient to a pathogen, location, onset date, severity, and symptoms. In SST, these are chains of typed arrows emanating from a case-report hub node.
case_records = [
# (case_id, patient, pathogen_ref, location_ref, onset, severity, symptoms)
(1, "P001", flu, "care_home_A", "2025-01-15", "severe", "fever,cough,dyspnoea,myalgia"),
(2, "P002", flu, "care_home_A", "2025-01-16", "moderate", "fever,cough,fatigue"),
(3, "P003", flu, "royal_infirmary", "2025-01-17", "mild", "fever,cough,sore throat"),
(4, "P010", flu, "care_home_A", "2025-01-18", "severe", "fever,cough,dyspnoea,hypoxia"),
(5, "P018", flu, "care_home_B", "2025-01-19", "moderate", "fever,cough,myalgia,fatigue"),
(6, "P004", cov, "general_hospital","2025-01-20", "mild", "cough,sore throat,headache"),
(7, "P005", cov, "general_hospital","2025-01-21", "severe", "fever,cough,dyspnoea,hypoxia,fatigue"),
(8, "P009", cov, "general_hospital","2025-01-22", "moderate", "fever,cough,myalgia"),
(9, "P012", cov, "royal_infirmary", "2025-01-22", "mild", "cough,rhinorrhoea"),
(10, "P013", cov, "general_hospital","2025-01-23", "mild", "cough,sore throat"),
(11, "P006", rsv, "primary_school", "2025-01-14", "moderate", "cough,wheeze,rhinorrhoea"),
(12, "P007", rsv, "primary_school", "2025-01-15", "moderate", "cough,wheeze,fever"),
(13, "P011", rsv, "royal_infirmary", "2025-01-17", "severe", "cough,wheeze,dyspnoea,tachypnoea,hypoxia"),
(14, "P015", rsv, "secondary_school","2025-01-18", "mild", "cough,rhinorrhoea"),
(15, "P019", rsv, "royal_infirmary", "2025-01-19", "severe", "wheeze,dyspnoea,tachypnoea,hypoxia"),
(16, "P008", flu, "general_hospital","2025-01-20", "moderate", "fever,cough,myalgia"),
(17, "P014", flu, "care_home_B", "2025-01-21", "severe", "fever,cough,dyspnoea,hypoxia"),
(18, "P016", cov, "general_hospital","2025-01-24", "moderate", "fever,cough,fatigue,myalgia"),
(19, "P017", cov, "royal_infirmary", "2025-01-24", "mild", "cough,sore throat"),
(20, "P020", cov, "general_hospital","2025-01-25", "mild", "cough,rhinorrhoea,headache"),
(21, "P021", flu, "secondary_school","2025-01-22", "mild", "fever,cough,sore throat"),
(22, "P022", flu, "care_home_A", "2025-01-23", "severe", "fever,cough,dyspnoea,hypoxia,fatigue"),
(23, "P023", rsv, "primary_school", "2025-01-20", "mild", "cough,rhinorrhoea"),
(24, "P024", cov, "royal_infirmary", "2025-01-25", "moderate", "fever,cough,dyspnoea"),
(25, "P025", cov, "royal_infirmary", "2025-01-26", "mild", "cough,fatigue"),
]
# Store case-report hub nodes with star topology
case_nodes = Dict{Int, Any}()
patient_pathogen = Dict{String, Any}() # for later use
patient_severity = Dict{String, String}()
patient_onset = Dict{String, String}()
for (id, pid, pathogen, loc_id, onset, severity, symp_str) in case_records
# Case report hub node
case = mem_vertex!(store, "Case $id: $pid", "case reports")
case_nodes[id] = case
# Patient → Case (patient leads to case report)
mem_edge!(store, patient_nodes[pid], "then", case)
# Case → Pathogen (infected by — a causal relationship)
mem_edge!(store, case, "then", pathogen)
# Case → Location (case at location — containment)
mem_edge!(store, locations[loc_id], "contain", case)
# Case properties — EXPRESS
onset_node = mem_vertex!(store, "onset: $onset", "timeline")
mem_edge!(store, case, "note", onset_node)
sev_node = mem_vertex!(store, "severity: $severity", "clinical")
mem_edge!(store, case, "note", sev_node)
# Symptoms — each symptom is linked from the case
for s in split(symp_str, ",")
s = strip(String(s))
if haskey(symptom_nodes, s)
mem_edge!(store, case, "has-pt", symptom_nodes[s])
end
end
patient_pathogen[pid] = pathogen
patient_severity[pid] = severity
patient_onset[pid] = onset
end
println("$(length(case_records)) case reports with star topology")
println("Graph: $(node_count(store)) nodes, $(link_count(store)) links")25 case reports with star topology
Graph: 112 nodes, 592 links3. Contact Tracing and Transmission Events
In SST, contacts are NEAR relationships (proximity/similarity), while transmission events are LEADSTO (causal direction). This mirrors the fundamental spacetime distinction: spatial proximity ≠ causal ordering.
# Known contacts — bidirectional NEAR relationships
contacts = [
("P001", "P002"), ("P001", "P010"), ("P002", "P010"),
("P002", "P022"), ("P003", "P012"), ("P003", "P017"),
("P012", "P017"), ("P006", "P007"), ("P006", "P015"),
("P004", "P009"), ("P005", "P009"), ("P009", "P013"),
("P013", "P016"), ("P016", "P020"), ("P011", "P019"),
("P003", "P001"), ("P017", "P024"), ("P025", "P019"),
]
for (a, b) in contacts
mem_edge!(store, patient_nodes[a], "ll", patient_nodes[b])
end
# Known transmissions — directed LEADSTO arrows
transmissions = [
("P001", "P002"), ("P001", "P010"), ("P002", "P022"),
("P006", "P007"), ("P004", "P009"), ("P009", "P013"),
("P013", "P016"), ("P016", "P020"), ("P011", "P019"),
("P003", "P017"),
]
for (src, tgt) in transmissions
mem_edge!(store, patient_nodes[src], "then", patient_nodes[tgt])
end
println("$(length(contacts)) contact links (NEAR), $(length(transmissions)) transmissions (LEADSTO)")18 contact links (NEAR), 10 transmissions (LEADSTO)4. Outbreak Definitions
Each outbreak is a container that groups case reports by pathogen.
# Three concurrent outbreaks
flu_outbreak = mem_vertex!(store, "Influenza A H3N2 winter outbreak", "outbreaks")
cov_outbreak = mem_vertex!(store, "SARS-CoV-2 JN.1 nosocomial cluster", "outbreaks")
rsv_outbreak = mem_vertex!(store, "RSV-B paediatric wave", "outbreaks")
outbreak_map = Dict(flu => flu_outbreak, cov => cov_outbreak, rsv => rsv_outbreak)
for (id, pid, pathogen, _...) in case_records
mem_edge!(store, outbreak_map[pathogen], "contain", case_nodes[id])
end
println("Three outbreaks defined")
println("Total graph: $(node_count(store)) nodes, $(link_count(store)) links")Three outbreaks defined
Total graph: 115 nodes, 698 links5. Epidemiological Queries — SST Graph Traversal
5a. Cases per pathogen
In RDF, this would be a SPARQL GROUP BY query. In SST, we traverse containment arrows from each outbreak node.
println("Cases by pathogen:")
println("─"^40)
for (pathogen_node, outbreak_node) in [(flu, flu_outbreak), (cov, cov_outbreak), (rsv, rsv_outbreak)]
cone = forward_cone(store, outbreak_node.nptr; depth=1)
# Collect all unique destination nodes from cone paths
seen = Set{NodePtr}()
cases = 0
for path in cone.paths
for lnk in path
lnk.dst in seen && continue
push!(seen, lnk.dst)
n = mem_get_node(store, lnk.dst)
if n !== nothing && startswith(n.s, "Case ")
cases += 1
end
end
end
println(" $(rpad(pathogen_node.s, 25)) $cases cases")
endCases by pathogen:
────────────────────────────────────────
Influenza A H3N2 9 cases
SARS-CoV-2 JN.1 10 cases
RSV subtype B 6 cases5b. Severity distribution
println("Severity distribution:")
println(rpad("Pathogen", 25), rpad("Severe", 10), rpad("Moderate", 10), "Mild")
println("─"^55)
pathogen_names = Dict(flu => "Influenza A H3N2", cov => "SARS-CoV-2 JN.1", rsv => "RSV subtype B")
for (pnode, pname) in pathogen_names
counts = Dict("severe" => 0, "moderate" => 0, "mild" => 0)
for (_, pid, pathogen, _, _, severity, _) in case_records
if pathogen === pnode
counts[severity] = get(counts, severity, 0) + 1
end
end
println(rpad(pname, 25),
rpad(string(counts["severe"]), 10),
rpad(string(counts["moderate"]), 10),
counts["mild"])
endSeverity distribution:
Pathogen Severe Moderate Mild
───────────────────────────────────────────────────────
RSV subtype B 2 2 2
Influenza A H3N2 4 3 2
SARS-CoV-2 JN.1 1 3 65c. Cases per location
println("Cases by location:")
println("─"^40)
for (loc_id, loc_node) in sort(collect(locations), by=x->x.first)
# Count cases contained at this location
count = 0
for (_, _, _, lid, _, _, _) in case_records
if lid == loc_id
count += 1
end
end
label = first(split(loc_node.s, '\n'))
println(" $(rpad(label, 30)) $count cases")
endCases by location:
────────────────────────────────────────
Sunrise Care Home 4 cases
Meadowbrook Care Home 2 cases
General Hospital 7 cases
Oakfield Primary School 3 cases
Royal Infirmary 7 cases
Riverside Academy 2 cases5d. Most common symptoms
# Count symptom frequency across all case reports using graph links
println("Symptom frequency:")
println("─"^35)
symptom_freq = Dict{String, Int}()
for (_, _, _, _, _, _, symp_str) in case_records
for s in split(symp_str, ",")
s = strip(String(s))
symptom_freq[s] = get(symptom_freq, s, 0) + 1
end
end
for (s, f) in sort(collect(symptom_freq), by=x -> -x.second)
bar = "█"^f
println(" $(rpad(s, 15)) $bar $f")
endSymptom frequency:
───────────────────────────────────
cough ████████████████████████ 24
fever ██████████████ 14
dyspnoea ████████ 8
hypoxia ██████ 6
fatigue ██████ 6
sore throat █████ 5
myalgia █████ 5
rhinorrhoea █████ 5
wheeze ████ 4
tachypnoea ██ 2
headache ██ 25e. Vulnerable patients with severe outcomes
println("Severe cases with risk factors:")
println("─"^60)
for (id, age_group, risks, vax) in patient_data
if haskey(patient_severity, id) && patient_severity[id] == "severe" && !isempty(risks)
pathogen_name = if haskey(patient_pathogen, id)
patient_pathogen[id].s
else
"unknown"
end
println(" $id ($age_group) — $pathogen_name, risks: $(join(risks, ", "))")
end
endSevere cases with risk factors:
────────────────────────────────────────────────────────────
P001 (65+) — Influenza A H3N2, risks: age over 65, chronic lung disease
P005 (65+) — SARS-CoV-2 JN.1, risks: age over 65, cardiovascular disease
P010 (65+) — Influenza A H3N2, risks: age over 65, immunocompromised
P014 (65+) — Influenza A H3N2, risks: age over 65, chronic lung disease
P022 (65+) — Influenza A H3N2, risks: age over 65, immunocompromised5f. Healthcare worker cases
println("Healthcare worker cases (potential nosocomial transmission):")
println("─"^65)
hcw_ids = [id for (id, _, risks, _) in patient_data if "healthcare worker" in risks]
for hcw in sort(hcw_ids)
if haskey(patient_onset, hcw)
pname = patient_pathogen[hcw].s
# Find location
loc_name = ""
for (_, pid, _, lid, _, _, _) in case_records
if pid == hcw
loc_name = locations[lid].s
break
end
end
println(" $(patient_onset[hcw]) | $hcw | $pname | $loc_name")
end
endHealthcare worker cases (potential nosocomial transmission):
─────────────────────────────────────────────────────────────────
2025-01-17 | P003 | Influenza A H3N2 | Royal Infirmary
2025-01-22 | P012 | SARS-CoV-2 JN.1 | Royal Infirmary
2025-01-24 | P017 | SARS-CoV-2 JN.1 | Royal Infirmary
2025-01-26 | P025 | SARS-CoV-2 JN.1 | Royal Infirmary6. Contact Tracing — Forward Cones and Path Finding
SST’s forward cone is the natural analogue of SPARQL property paths. Starting from an index case, the cone traces all reachable nodes through causal (LEADSTO) arrows — exactly what epidemiological contact tracing needs.
6a. Contact chain from index case P001
# Forward cone from P001 — traces all reachable connections
cone_p001 = forward_cone(store, patient_nodes["P001"].nptr; depth=5)
# Collect all unique patient nodes reachable from P001
reachable = Set{String}()
for path in cone_p001.paths
for lnk in path
n = mem_get_node(store, lnk.dst)
if n !== nothing && n.chap == "patients" && n.s != "P001"
push!(reachable, n.s)
end
end
end
println("Contact/transmission chain from P001 ($(length(reachable)) patients reachable):")
for pid in sort(collect(reachable))
println(" → $pid")
endContact/transmission chain from P001 (3 patients reachable):
→ P002
→ P010
→ P0226b. Transmission chain — path finding
SST path solving traces specific routes through the graph, revealing the hop-by-hop transmission chain.
# Find paths from P004 (COVID index) to P020 (end of chain)
result = find_paths(store, patient_nodes["P004"].nptr, patient_nodes["P020"].nptr; max_depth=8)
if !isempty(result.paths)
path_nodes = result.paths[1] # first DAG path
println("Transmission chain P004 → P020 ($(length(path_nodes)) nodes, $(length(result.paths)) paths found):")
for (i, nptr) in enumerate(path_nodes)
n = mem_get_node(store, nptr)
if n !== nothing
label = first(split(n.s, '\n'))
prefix = i == 1 ? " " : " → "
println("$prefix$label [$(n.chap)]")
end
end
else
println("No DAG path found P004 → P020 ($(length(result.loops)) loop paths)")
endTransmission chain P004 → P020 (5 nodes, 1 paths found):
P004 [patients]
→ P009 [patients]
→ P013 [patients]
→ P016 [patients]
→ P020 [patients]6c. Backward cone — who could have infected P022?
# Backward cone traces incoming arrows — who led to this patient?
cone_back = backward_cone(store, patient_nodes["P022"].nptr; depth=3)
reachable_back = Set{String}()
for path in cone_back.paths
for lnk in path
n = mem_get_node(store, lnk.dst)
if n !== nothing && n.chap == "patients" && n.s != "P022"
push!(reachable_back, n.s)
end
end
end
println("Backward cone from P022 (potential sources):")
for pid in sort(collect(reachable_back))
println(" ← $pid")
endBackward cone from P022 (potential sources):
← P001
← P0027. Network Analysis — Identifying Super-Spreaders
7a. Adjacency matrix and centrality
# Build adjacency matrix from patient transmission subgraph
adj = SemanticSpacetime.AdjacencyMatrix()
for (src, tgt) in transmissions
src_ptr = patient_nodes[src].nptr
tgt_ptr = patient_nodes[tgt].nptr
SemanticSpacetime.add_edge!(adj, src_ptr, tgt_ptr, 1.0)
end
# Add contact edges with lower weight
for (a, b) in contacts
a_ptr = patient_nodes[a].nptr
b_ptr = patient_nodes[b].nptr
SemanticSpacetime.add_edge!(adj, a_ptr, b_ptr, 0.3)
end
println(graph_summary(adj))Graph Summary
─────────────
Nodes: 20
Links: 18 (directed)
Sources: 6
Sinks: 7
Top centrality:
(1,40) 0.0500
(1,21) 0.0500
(1,27) 0.0500
(1,26) 0.0500
(1,42) 0.05007b. Eigenvector centrality — who is most connected?
if !isempty(adj.nodes)
centrality = eigenvector_centrality(adj; max_iter=100, tol=1e-6)
ranked = sort(collect(centrality), by=x -> x.second, rev=true)
println("Patient centrality (transmission + contact network):")
println("─"^45)
for (nptr, score) in ranked[1:min(10, length(ranked))]
node = mem_get_node(store, nptr)
if node !== nothing
pid = node.s
sev = get(patient_severity, pid, "?")
println(" $(rpad(pid, 8)) score=$(rpad(round(score, digits=4), 8)) severity=$sev")
end
end
endPatient centrality (transmission + contact network):
─────────────────────────────────────────────
P020 score=0.05 severity=mild
P001 score=0.05 severity=severe
P007 score=0.05 severity=moderate
P006 score=0.05 severity=moderate
P022 score=0.05 severity=severe
P015 score=0.05 severity=mild
P010 score=0.05 severity=severe
P004 score=0.05 severity=mild
P017 score=0.05 severity=mild
P003 score=0.05 severity=mild7c. Sources and sinks — index cases and endpoints
# Build a directed-only adjacency matrix for transmission
adj_tx = SemanticSpacetime.AdjacencyMatrix()
for (src, tgt) in transmissions
SemanticSpacetime.add_edge!(adj_tx, patient_nodes[src].nptr, patient_nodes[tgt].nptr, 1.0)
end
sources = find_sources(adj_tx)
sinks = find_sinks(adj_tx)
println("Index cases (transmission sources):")
for nptr in sources
n = mem_get_node(store, nptr)
n !== nothing && println(" ⚠ $(n.s) — $(get(patient_pathogen, n.s, nothing) !== nothing ? patient_pathogen[n.s].s : "?")")
end
println("\nEndpoint cases (transmission sinks):")
for nptr in sinks
n = mem_get_node(store, nptr)
n !== nothing && println(" ○ $(n.s)")
endIndex cases (transmission sources):
⚠ P001 — Influenza A H3N2
⚠ P003 — Influenza A H3N2
⚠ P004 — SARS-CoV-2 JN.1
⚠ P006 — RSV subtype B
⚠ P011 — RSV subtype B
Endpoint cases (transmission sinks):
○ P007
○ P010
○ P017
○ P019
○ P020
○ P0228. Risk Inference — SST Contextual Analysis
Instead of Datalog reasoning, SST uses contextual analysis and graph traversal to infer risk categories. The spacetime type system provides a natural framework: LEADSTO arrows carry causal weight, while CONTAINS and EXPRESS arrows provide structural context.
8a. Risk classification
println("Risk Classification")
println("════════════════════")
# High-risk elderly: age 65+ with severe outcome
println("\nHigh-risk elderly (65+ with severe outcome):")
for (id, age_group, risks, vax) in patient_data
if age_group == "65+" && get(patient_severity, id, "") == "severe"
println(" ⚠ $id — risks: $(isempty(risks) ? "none" : join(risks, ", "))")
end
end
# High-risk paediatric: age 0-4 with severe outcome
println("\nHigh-risk paediatric (0-4 with severe outcome):")
for (id, age_group, _, _) in patient_data
if age_group == "0-4" && get(patient_severity, id, "") == "severe"
println(" ⚠ $id")
end
end
# Potential super-spreaders: patients who transmitted to ≥2 others
tx_count = Dict{String, Int}()
for (src, _) in transmissions
tx_count[src] = get(tx_count, src, 0) + 1
end
println("\nPotential super-spreaders (transmitted to ≥2):")
for (pid, count) in sort(collect(tx_count), by=x -> -x.second)
if count >= 2
println(" ⚠ $pid — infected $count others")
end
end
# HCW-exposed patients
println("\nPatients exposed via healthcare worker contact:")
for (a, b) in contacts
if "healthcare worker" in [rf for (id, _, rfs, _) in patient_data if id == a for rf in rfs]
println(" → $b (exposed via HCW $a)")
end
endRisk Classification
════════════════════
High-risk elderly (65+ with severe outcome):
⚠ P001 — risks: age over 65, chronic lung disease
⚠ P005 — risks: age over 65, cardiovascular disease
⚠ P010 — risks: age over 65, immunocompromised
⚠ P014 — risks: age over 65, chronic lung disease
⚠ P022 — risks: age over 65, immunocompromised
High-risk paediatric (0-4 with severe outcome):
⚠ P011
⚠ P019
Potential super-spreaders (transmitted to ≥2):
⚠ P001 — infected 2 others
Patients exposed via healthcare worker contact:
→ P012 (exposed via HCW P003)
→ P017 (exposed via HCW P003)
→ P017 (exposed via HCW P012)
→ P001 (exposed via HCW P003)
→ P024 (exposed via HCW P017)
→ P019 (exposed via HCW P025)8b. Serial interval analysis
println("Serial Interval Analysis")
println("════════════════════════")
println(rpad("Pair", 16), rpad("Infector", 10), rpad("Infectee", 10),
rpad("SI (days)", 12), "Class")
println("─"^58)
onset_day = Dict{String, Int}()
for (_, pid, _, _, onset, _, _) in case_records
onset_day[pid] = parse(Int, split(onset, "-")[3])
end
for (i, (src, tgt)) in enumerate(transmissions)
if haskey(onset_day, src) && haskey(onset_day, tgt)
si = onset_day[tgt] - onset_day[src]
cls = si < 2 ? "short" : (si < 5 ? "typical" : "long")
println(rpad("pair$i", 16), rpad(src, 10), rpad(tgt, 10),
rpad(string(si), 12), cls)
end
endSerial Interval Analysis
════════════════════════
Pair Infector Infectee SI (days) Class
──────────────────────────────────────────────────────────
pair1 P001 P002 1 short
pair2 P001 P010 3 typical
pair3 P002 P022 7 long
pair4 P006 P007 1 short
pair5 P004 P009 2 typical
pair6 P009 P013 1 short
pair7 P013 P016 1 short
pair8 P016 P020 1 short
pair9 P011 P019 2 typical
pair10 P003 P017 7 long8c. Clinical decision support
println("Clinical Decision Support")
println("═════════════════════════")
actions = Dict{String, Vector{String}}()
for (id, age_group, risks, _) in patient_data
sev = get(patient_severity, id, "")
n_contacts = count(c -> c[1] == id || c[2] == id, contacts)
# Isolation required — severe with contacts
if sev == "severe" && n_contacts >= 1
push!(get!(actions, "ISOLATION REQUIRED", String[]), id)
end
# ICU referral — severe + elderly
if sev == "severe" && age_group == "65+"
push!(get!(actions, "ICU REFERRAL", String[]), id)
end
# Contact tracing priority — ≥3 contacts
if n_contacts >= 3
push!(get!(actions, "CONTACT TRACING PRIORITY", String[]), id)
end
# Paediatric escalation — severe + under 5
if sev == "severe" && age_group == "0-4"
push!(get!(actions, "PAEDIATRIC ESCALATION", String[]), id)
end
end
for (action, pids) in sort(collect(actions))
println("\n $action:")
for p in sort(pids)
println(" → $p")
end
endClinical Decision Support
═════════════════════════
CONTACT TRACING PRIORITY:
→ P001
→ P002
→ P003
→ P009
→ P017
ICU REFERRAL:
→ P001
→ P005
→ P010
→ P014
→ P022
ISOLATION REQUIRED:
→ P001
→ P005
→ P010
→ P011
→ P019
→ P022
PAEDIATRIC ESCALATION:
→ P011
→ P0199. Weighted Search — Risk Propagation
SST’s weighted search propagates risk scores through the network, modelling how outbreak risk diffuses through contact chains.
# Weighted search from index case P001
wpaths = weighted_search(store, patient_nodes["P001"].nptr; max_depth=4)
ranked = rank_by_weight(wpaths)
println("Risk propagation from index case P001 ($(length(ranked)) paths found):")
println("─"^55)
seen = Set{String}()
for wp in ranked[1:min(15, length(ranked))]
# Show the endpoint of each path
endpoint = last(wp.nodes)
n = mem_get_node(store, endpoint)
if n !== nothing && n.chap == "patients" && n.s ∉ seen && n.s != "P001"
push!(seen, n.s)
sev = get(patient_severity, n.s, "?")
println(" $(rpad(n.s, 8)) path_weight=$(rpad(round(wp.total_weight, digits=2), 8)) severity=$sev")
end
endRisk propagation from index case P001 (3368 paths found):
───────────────────────────────────────────────────────
P005 path_weight=4.0 severity=severe
P014 path_weight=4.0 severity=severe
P018 path_weight=4.0 severity=moderate
P022 path_weight=4.0 severity=severe10. Text Search — Syndromic Surveillance
SST’s text search enables natural-language-style queries, useful for syndromic surveillance where clinicians search for symptom patterns.
# Search for respiratory distress indicators
for query in ["dyspnoea", "hypoxia", "wheeze", "severe"]
results = mem_search_text(store, query)
titles = filter(n -> !occursin('\n', n.s) && length(n.s) < 80, results)
println("Search '$query': $(length(titles)) matches")
for n in titles[1:min(3, length(titles))]
println(" '$(n.s)' [$(n.chap)]")
end
endSearch 'dyspnoea': 1 matches
'dyspnoea' [symptoms]
Search 'hypoxia': 1 matches
'hypoxia' [symptoms]
Search 'wheeze': 1 matches
'wheeze' [symptoms]
Search 'severe': 1 matches
'severity: severe' [clinical]11. Epidemic Curve
dates = sort(unique([onset for (_, _, _, _, onset, _, _) in case_records]))
pathogen_short = Dict(flu => "Flu", cov => "CoV", rsv => "RSV")
println("Epidemic Curve (onset dates)")
println("════════════════════════════")
for d in dates
day_cases = [(pid, p) for (_, pid, p, _, onset, _, _) in case_records if onset == d]
markers = join([pathogen_short[p] for (_, p) in day_cases], " ")
bar = "█"^length(day_cases)
println(" $d | $bar $(length(day_cases)) ($markers)")
end
println("\nLegend: Flu=Influenza, CoV=SARS-CoV-2, RSV=RSV-B")Epidemic Curve (onset dates)
════════════════════════════
2025-01-14 | █ 1 (RSV)
2025-01-15 | ██ 2 (Flu RSV)
2025-01-16 | █ 1 (Flu)
2025-01-17 | ██ 2 (Flu RSV)
2025-01-18 | ██ 2 (Flu RSV)
2025-01-19 | ██ 2 (Flu RSV)
2025-01-20 | ███ 3 (CoV Flu RSV)
2025-01-21 | ██ 2 (CoV Flu)
2025-01-22 | ███ 3 (CoV CoV Flu)
2025-01-23 | ██ 2 (CoV Flu)
2025-01-24 | ██ 2 (CoV CoV)
2025-01-25 | ██ 2 (CoV CoV)
2025-01-26 | █ 1 (CoV)
Legend: Flu=Influenza, CoV=SARS-CoV-2, RSV=RSV-B12. Location–Pathogen Heatmap
loc_names_sorted = sort(collect(keys(locations)))
pathogen_list = [("Influenza", flu), ("SARS-CoV-2", cov), ("RSV-B", rsv)]
println(rpad("Location", 28), join([rpad(p, 14) for (p, _) in pathogen_list]))
println("─"^(28 + 14 * length(pathogen_list)))
for loc_id in loc_names_sorted
loc_name = locations[loc_id].s
print(rpad(loc_name[1:min(27, length(loc_name))], 28))
for (_, pnode) in pathogen_list
n = count(r -> r[3] === pnode && r[4] == loc_id, case_records)
cell = n == 0 ? " ·" : " $(repeat("█", n)) $n"
print(rpad(cell, 14))
end
println()
endLocation Influenza SARS-CoV-2 RSV-B
──────────────────────────────────────────────────────────────────────
Sunrise Care Home ████ 4 · ·
Meadowbrook Care Home ██ 2 · ·
General Hospital █ 1 ██████ 6 ·
Oakfield Primary School · · ███ 3
Royal Infirmary █ 1 ████ 4 ██ 2
Riverside Academy █ 1 · █ 113. Chapter Distribution
SST chapters provide a natural typing system. The distribution of nodes across chapters reveals the knowledge graph’s structure.
chap_counts = Dict{String,Int}()
for (_, node) in store.nodes
chap_counts[node.chap] = get(chap_counts, node.chap, 0) + 1
end
println("Chapter distribution:")
for (chap, count) in sort(collect(chap_counts), by=x -> -x.second)
bar = "█"^min(count, 40)
println(" $(rpad(chap, 25)) $bar $count")
endChapter distribution:
case reports █████████████████████████ 25
patients █████████████████████████ 25
timeline █████████████ 13
symptoms ███████████ 11
risk factors ████████ 8
locations ██████ 6
demographics █████ 5
pathogen taxonomy █████ 5
pathogens ███ 3
location types ███ 3
pathogen properties ███ 3
clinical ███ 3
outbreaks ███ 3
interventions ██ 214. Graph Statistics
# Full graph adjacency matrix
adj_full = SemanticSpacetime.AdjacencyMatrix()
for (nptr, node) in store.nodes
for links in node.incidence
for link in links
SemanticSpacetime.add_edge!(adj_full, nptr, link.dst, Float64(link.wgt))
end
end
end
println(graph_summary(adj_full))Graph Summary
─────────────
Nodes: 115
Links: 678 (directed)
Sources: 0
Sinks: 0
Top centrality:
(1,8) 1.0000
(1,7) 0.6577
(3,22) 0.4876
(3,37) 0.4785
(3,1) 0.4667Summary
This vignette demonstrated how SemanticSpacetime.jl naturally models epidemiological knowledge using its spacetime type system:
| SST Feature | Epidemiological Application |
|---|---|
| Chapters | Entity typing: patients, pathogens, locations, symptoms |
LEADSTO arrows (then) | Causal transmission chains, temporal ordering |
CONTAINS arrows (contain, has-pt) | Location containment, symptom membership, outbreak grouping |
EXPRESS arrows (note) | Metadata: onset dates, severity, vaccination status |
NEAR arrows (ll) | Contact tracing (spatial proximity without causation) |
| Forward/backward cones | Contact tracing chains (analogue of SPARQL property paths) |
| Path finding | Transmission chain reconstruction |
| Eigenvector centrality | Super-spreader identification |
| Weighted search | Risk propagation through the network |
| Text search | Syndromic surveillance queries |
| Adjacency analysis | Sources (index cases), sinks (endpoints), degree distribution |
SST vs RDF for Epidemiology
The SST approach differs from RDF in several key ways:
- No separate ontology language: The arrow type system (LEADSTO, CONTAINS, EXPRESS, NEAR) provides built-in semantics without needing OWL declarations.
- Spacetime semantics: The distinction between causal arrows (LEADSTO) and proximity arrows (NEAR) maps directly to the epidemiological distinction between transmission (causal) and contact (spatial).
- Graph-native queries: Instead of SPARQL, SST uses forward/backward cones, weighted search, and path finding — operations optimised for knowledge graph traversal.
- Chapters as types: Chapter membership provides lightweight typing without the overhead of class hierarchies.
Key epidemiological findings from the scenario:
- Influenza A H3N2 clustered in care homes (4 cases at Sunrise Care Home alone), with severe outcomes among elderly patients with comorbidities.
- SARS-CoV-2 JN.1 formed a nosocomial chain at General Hospital (P004→P009→P013→P016→P020) amplified by healthcare worker contacts.
- RSV-B primarily affected children in school settings, with two severe paediatric cases requiring hospitalisation.
- Network centrality identified P001 and P009 as key nodes in the transmission network, consistent with their roles as index cases in the care home and hospital clusters respectively.