Epidemiology Knowledge Graph

Simon Frost

Introduction

Epidemiology requires integrating heterogeneous data — pathogens, hosts, clinical observations, geographic surveillance, intervention records, and contact tracing data — into a coherent knowledge model. Semantic Spacetime is well-suited for this: its typed arrow system naturally expresses causal chains (LEADSTO), containment hierarchies (CONTAINS), property annotations (EXPRESS), and similarity relationships (NEAR) across domains — without requiring a separate ontology language.

This vignette builds a comprehensive infectious disease knowledge graph for a fictitious multi-pathogen respiratory outbreak scenario, and demonstrates SemanticSpacetime.jl’s capabilities:

  1. N4L knowledge encoding — pathogens, hosts, clinical features, contacts
  2. Typed arrow semantics — SST’s spacetime types map naturally to epidemiology
  3. Graph traversal — contact tracing via forward/backward cones
  4. Path finding — transmission chain reconstruction
  5. Centrality & orbit analysis — identifying super-spreader nodes
  6. Weighted search — risk propagation through the contact network
  7. Context intelligence — inferring outbreak clusters
  8. Text search — syndromic surveillance queries
  9. Visualization — outbreak network analysis

The scenario models an outbreak involving three pathogens (an influenza variant, a novel coronavirus, and an RSV strain) circulating simultaneously across hospital wards and community settings, reflecting the challenges of real-world syndromic surveillance.

using SemanticSpacetime

SemanticSpacetime.reset_arrows!()
SemanticSpacetime.reset_contexts!()
add_mandatory_arrows!()

config_dir = let d = joinpath(@__DIR__, "..", "..", "..", "SSTorytime", "SSTconfig")
    isdir(d) ? d : nothing
end
if config_dir !== nothing
    st = SemanticSpacetime.N4LState()
    for cf in read_config_files(config_dir)
        SemanticSpacetime.parse_config_file(cf; st=st)
    end
end

store = MemoryStore()
println("SemanticSpacetime.jl epidemiology vignette")
println("Arrows loaded: config_dir=$(config_dir !== nothing ? "yes" : "no")")
SemanticSpacetime.jl epidemiology vignette
Arrows loaded: config_dir=yes

1. Building the Outbreak Ontology in SST

In SST, the ontology is implicit in the arrow types and chapters rather than declared via OWL. Each entity lives in a typed chapter, and relationships use SST’s spacetime arrow semantics:

RDF / OWL ConceptSST Equivalent
rdf:type / OWL ClassChapter membership
rdfs:subClassOfCONTAINS arrows ((contain))
Object properties (causal)LEADSTO arrows ((leads to), (then))
Object properties (structural)CONTAINS arrows ((has-pt), (consist))
Data propertiesEXPRESS arrows ((note), (data), (name))
owl:sameAs / similarityNEAR arrows ((ll), (syn))

1a. Pathogens

# Create pathogen nodes in a "pathogens" chapter
flu   = mem_vertex!(store, "Influenza A H3N2",  "pathogens")
cov   = mem_vertex!(store, "SARS-CoV-2 JN.1",  "pathogens")
rsv   = mem_vertex!(store, "RSV subtype B",     "pathogens")

# Pathogen taxonomy — CONTAINS captures "is a kind of"
virus = mem_vertex!(store, "Virus",     "pathogen taxonomy")
resp  = mem_vertex!(store, "Respiratory pathogen", "pathogen taxonomy")

for p in [flu, cov, rsv]
    mem_edge!(store, virus, "contain", p)
    mem_edge!(store, resp, "contain", p)
end

# Pathogen properties — EXPRESS arrows for metadata
flu_r0  = mem_vertex!(store, "R0 = 1.3",  "pathogen properties")
cov_r0  = mem_vertex!(store, "R0 = 2.5",  "pathogen properties")
rsv_r0  = mem_vertex!(store, "R0 = 3.0",  "pathogen properties")
mem_edge!(store, flu, "note", flu_r0)
mem_edge!(store, cov, "note", cov_r0)
mem_edge!(store, rsv, "note", rsv_r0)

# Pathogen families
for (p, fam) in [(flu, "Orthomyxoviridae"), (cov, "Coronaviridae"), (rsv, "Pneumoviridae")]
    fam_node = mem_vertex!(store, fam, "pathogen taxonomy")
    mem_edge!(store, fam_node, "contain", p)
end

println("Pathogens: $(length([flu, cov, rsv])) pathogens created")
Pathogens: 3 pathogens created

1b. Locations

# Location hierarchy using CONTAINS
locations = Dict{String, Any}()
for (id, name, cls) in [
    ("royal_infirmary",  "Royal Infirmary",        "Hospital"),
    ("general_hospital", "General Hospital",        "Hospital"),
    ("care_home_A",      "Sunrise Care Home",       "Care Home"),
    ("care_home_B",      "Meadowbrook Care Home",   "Care Home"),
    ("primary_school",   "Oakfield Primary School",  "School"),
    ("secondary_school", "Riverside Academy",         "School"),
]
    loc = mem_vertex!(store, name, "locations")
    locations[id] = loc
    # Type classification via containment
    cls_node = mem_vertex!(store, cls, "location types")
    mem_edge!(store, cls_node, "contain", loc)
end

println("Locations: $(length(locations)) created")
Locations: 6 created

1c. Symptoms and Risk Factors

symptom_nodes = Dict{String, Any}()
for s in ["fever", "cough", "dyspnoea", "fatigue", "myalgia",
          "sore throat", "rhinorrhoea", "headache", "wheeze",
          "hypoxia", "tachypnoea"]
    symptom_nodes[s] = mem_vertex!(store, s, "symptoms")
end

risk_nodes = Dict{String, Any}()
for rf in ["age over 65", "immunocompromised", "chronic lung disease",
           "diabetes", "cardiovascular disease", "obesity",
           "pregnancy", "healthcare worker"]
    risk_nodes[rf] = mem_vertex!(store, rf, "risk factors")
end

println("$(length(symptom_nodes)) symptoms, $(length(risk_nodes)) risk factors")
11 symptoms, 8 risk factors

1d. Patients (25 patients in the outbreak)

patient_data = [
    # (id, age_group, risk_factors, vaccination_status)
    ("P001", "65+",   ["age over 65", "chronic lung disease"],    "vaccinated"),
    ("P002", "65+",   ["age over 65", "diabetes"],                "vaccinated"),
    ("P003", "40-64", ["healthcare worker"],                      "vaccinated"),
    ("P004", "18-39", String[],                                   "unvaccinated"),
    ("P005", "65+",   ["age over 65", "cardiovascular disease"],  "vaccinated"),
    ("P006", "5-17",  String[],                                   "vaccinated"),
    ("P007", "5-17",  String[],                                   "unvaccinated"),
    ("P008", "18-39", ["pregnancy"],                              "vaccinated"),
    ("P009", "40-64", ["obesity"],                                "unvaccinated"),
    ("P010", "65+",   ["age over 65", "immunocompromised"],       "vaccinated"),
    ("P011", "0-4",   String[],                                   "unvaccinated"),
    ("P012", "40-64", ["healthcare worker"],                      "vaccinated"),
    ("P013", "18-39", String[],                                   "vaccinated"),
    ("P014", "65+",   ["age over 65", "chronic lung disease"],    "unvaccinated"),
    ("P015", "5-17",  String[],                                   "vaccinated"),
    ("P016", "40-64", ["diabetes", "obesity"],                    "unvaccinated"),
    ("P017", "18-39", ["healthcare worker"],                      "vaccinated"),
    ("P018", "65+",   ["age over 65"],                            "vaccinated"),
    ("P019", "0-4",   String[],                                   "unvaccinated"),
    ("P020", "40-64", String[],                                   "vaccinated"),
    ("P021", "18-39", String[],                                   "unvaccinated"),
    ("P022", "65+",   ["age over 65", "immunocompromised"],       "vaccinated"),
    ("P023", "5-17",  String[],                                   "vaccinated"),
    ("P024", "40-64", ["chronic lung disease"],                   "vaccinated"),
    ("P025", "18-39", ["healthcare worker"],                      "vaccinated"),
]

patient_nodes = Dict{String, Any}()
for (id, age_group, risks, vax) in patient_data
    p = mem_vertex!(store, id, "patients")
    patient_nodes[id] = p
    # Age group — EXPRESS property
    age_node = mem_vertex!(store, "age group: $age_group", "demographics")
    mem_edge!(store, p, "note", age_node)
    # Vaccination status
    vax_node = mem_vertex!(store, "vaccination: $vax", "interventions")
    mem_edge!(store, p, "note", vax_node)
    # Risk factors — CONTAINS (patient has risk factor)
    for rf in risks
        mem_edge!(store, p, "has-pt", risk_nodes[rf])
    end
end

println("$(length(patient_nodes)) patients created")
25 patients created

2. Surveillance Data — Case Reports

Each case report links a patient to a pathogen, location, onset date, severity, and symptoms. In SST, these are chains of typed arrows emanating from a case-report hub node.

case_records = [
    # (case_id, patient, pathogen_ref, location_ref, onset, severity, symptoms)
    (1,  "P001", flu, "care_home_A",     "2025-01-15", "severe",   "fever,cough,dyspnoea,myalgia"),
    (2,  "P002", flu, "care_home_A",     "2025-01-16", "moderate", "fever,cough,fatigue"),
    (3,  "P003", flu, "royal_infirmary", "2025-01-17", "mild",     "fever,cough,sore throat"),
    (4,  "P010", flu, "care_home_A",     "2025-01-18", "severe",   "fever,cough,dyspnoea,hypoxia"),
    (5,  "P018", flu, "care_home_B",     "2025-01-19", "moderate", "fever,cough,myalgia,fatigue"),
    (6,  "P004", cov, "general_hospital","2025-01-20", "mild",     "cough,sore throat,headache"),
    (7,  "P005", cov, "general_hospital","2025-01-21", "severe",   "fever,cough,dyspnoea,hypoxia,fatigue"),
    (8,  "P009", cov, "general_hospital","2025-01-22", "moderate", "fever,cough,myalgia"),
    (9,  "P012", cov, "royal_infirmary", "2025-01-22", "mild",     "cough,rhinorrhoea"),
    (10, "P013", cov, "general_hospital","2025-01-23", "mild",     "cough,sore throat"),
    (11, "P006", rsv, "primary_school",  "2025-01-14", "moderate", "cough,wheeze,rhinorrhoea"),
    (12, "P007", rsv, "primary_school",  "2025-01-15", "moderate", "cough,wheeze,fever"),
    (13, "P011", rsv, "royal_infirmary", "2025-01-17", "severe",   "cough,wheeze,dyspnoea,tachypnoea,hypoxia"),
    (14, "P015", rsv, "secondary_school","2025-01-18", "mild",     "cough,rhinorrhoea"),
    (15, "P019", rsv, "royal_infirmary", "2025-01-19", "severe",   "wheeze,dyspnoea,tachypnoea,hypoxia"),
    (16, "P008", flu, "general_hospital","2025-01-20", "moderate", "fever,cough,myalgia"),
    (17, "P014", flu, "care_home_B",     "2025-01-21", "severe",   "fever,cough,dyspnoea,hypoxia"),
    (18, "P016", cov, "general_hospital","2025-01-24", "moderate", "fever,cough,fatigue,myalgia"),
    (19, "P017", cov, "royal_infirmary", "2025-01-24", "mild",     "cough,sore throat"),
    (20, "P020", cov, "general_hospital","2025-01-25", "mild",     "cough,rhinorrhoea,headache"),
    (21, "P021", flu, "secondary_school","2025-01-22", "mild",     "fever,cough,sore throat"),
    (22, "P022", flu, "care_home_A",     "2025-01-23", "severe",   "fever,cough,dyspnoea,hypoxia,fatigue"),
    (23, "P023", rsv, "primary_school",  "2025-01-20", "mild",     "cough,rhinorrhoea"),
    (24, "P024", cov, "royal_infirmary", "2025-01-25", "moderate", "fever,cough,dyspnoea"),
    (25, "P025", cov, "royal_infirmary", "2025-01-26", "mild",     "cough,fatigue"),
]

# Store case-report hub nodes with star topology
case_nodes = Dict{Int, Any}()
patient_pathogen = Dict{String, Any}()  # for later use
patient_severity = Dict{String, String}()
patient_onset = Dict{String, String}()

for (id, pid, pathogen, loc_id, onset, severity, symp_str) in case_records
    # Case report hub node
    case = mem_vertex!(store, "Case $id: $pid", "case reports")
    case_nodes[id] = case

    # Patient → Case (patient leads to case report)
    mem_edge!(store, patient_nodes[pid], "then", case)

    # Case → Pathogen (infected by — a causal relationship)
    mem_edge!(store, case, "then", pathogen)

    # Case → Location (case at location — containment)
    mem_edge!(store, locations[loc_id], "contain", case)

    # Case properties — EXPRESS
    onset_node = mem_vertex!(store, "onset: $onset", "timeline")
    mem_edge!(store, case, "note", onset_node)

    sev_node = mem_vertex!(store, "severity: $severity", "clinical")
    mem_edge!(store, case, "note", sev_node)

    # Symptoms — each symptom is linked from the case
    for s in split(symp_str, ",")
        s = strip(String(s))
        if haskey(symptom_nodes, s)
            mem_edge!(store, case, "has-pt", symptom_nodes[s])
        end
    end

    patient_pathogen[pid] = pathogen
    patient_severity[pid] = severity
    patient_onset[pid] = onset
end

println("$(length(case_records)) case reports with star topology")
println("Graph: $(node_count(store)) nodes, $(link_count(store)) links")
25 case reports with star topology
Graph: 112 nodes, 592 links

3. Contact Tracing and Transmission Events

In SST, contacts are NEAR relationships (proximity/similarity), while transmission events are LEADSTO (causal direction). This mirrors the fundamental spacetime distinction: spatial proximity ≠ causal ordering.

# Known contacts — bidirectional NEAR relationships
contacts = [
    ("P001", "P002"), ("P001", "P010"), ("P002", "P010"),
    ("P002", "P022"), ("P003", "P012"), ("P003", "P017"),
    ("P012", "P017"), ("P006", "P007"), ("P006", "P015"),
    ("P004", "P009"), ("P005", "P009"), ("P009", "P013"),
    ("P013", "P016"), ("P016", "P020"), ("P011", "P019"),
    ("P003", "P001"), ("P017", "P024"), ("P025", "P019"),
]

for (a, b) in contacts
    mem_edge!(store, patient_nodes[a], "ll", patient_nodes[b])
end

# Known transmissions — directed LEADSTO arrows
transmissions = [
    ("P001", "P002"), ("P001", "P010"), ("P002", "P022"),
    ("P006", "P007"), ("P004", "P009"), ("P009", "P013"),
    ("P013", "P016"), ("P016", "P020"), ("P011", "P019"),
    ("P003", "P017"),
]

for (src, tgt) in transmissions
    mem_edge!(store, patient_nodes[src], "then", patient_nodes[tgt])
end

println("$(length(contacts)) contact links (NEAR), $(length(transmissions)) transmissions (LEADSTO)")
18 contact links (NEAR), 10 transmissions (LEADSTO)

4. Outbreak Definitions

Each outbreak is a container that groups case reports by pathogen.

# Three concurrent outbreaks
flu_outbreak = mem_vertex!(store, "Influenza A H3N2 winter outbreak", "outbreaks")
cov_outbreak = mem_vertex!(store, "SARS-CoV-2 JN.1 nosocomial cluster", "outbreaks")
rsv_outbreak = mem_vertex!(store, "RSV-B paediatric wave", "outbreaks")

outbreak_map = Dict(flu => flu_outbreak, cov => cov_outbreak, rsv => rsv_outbreak)

for (id, pid, pathogen, _...) in case_records
    mem_edge!(store, outbreak_map[pathogen], "contain", case_nodes[id])
end

println("Three outbreaks defined")
println("Total graph: $(node_count(store)) nodes, $(link_count(store)) links")
Three outbreaks defined
Total graph: 115 nodes, 698 links

5. Epidemiological Queries — SST Graph Traversal

5a. Cases per pathogen

In RDF, this would be a SPARQL GROUP BY query. In SST, we traverse containment arrows from each outbreak node.

println("Cases by pathogen:")
println("─"^40)
for (pathogen_node, outbreak_node) in [(flu, flu_outbreak), (cov, cov_outbreak), (rsv, rsv_outbreak)]
    cone = forward_cone(store, outbreak_node.nptr; depth=1)
    # Collect all unique destination nodes from cone paths
    seen = Set{NodePtr}()
    cases = 0
    for path in cone.paths
        for lnk in path
            lnk.dst in seen && continue
            push!(seen, lnk.dst)
            n = mem_get_node(store, lnk.dst)
            if n !== nothing && startswith(n.s, "Case ")
                cases += 1
            end
        end
    end
    println("  $(rpad(pathogen_node.s, 25)) $cases cases")
end
Cases by pathogen:
────────────────────────────────────────
  Influenza A H3N2          9 cases
  SARS-CoV-2 JN.1           10 cases
  RSV subtype B             6 cases

5b. Severity distribution

println("Severity distribution:")
println(rpad("Pathogen", 25), rpad("Severe", 10), rpad("Moderate", 10), "Mild")
println("─"^55)

pathogen_names = Dict(flu => "Influenza A H3N2", cov => "SARS-CoV-2 JN.1", rsv => "RSV subtype B")

for (pnode, pname) in pathogen_names
    counts = Dict("severe" => 0, "moderate" => 0, "mild" => 0)
    for (_, pid, pathogen, _, _, severity, _) in case_records
        if pathogen === pnode
            counts[severity] = get(counts, severity, 0) + 1
        end
    end
    println(rpad(pname, 25),
            rpad(string(counts["severe"]), 10),
            rpad(string(counts["moderate"]), 10),
            counts["mild"])
end
Severity distribution:
Pathogen                 Severe    Moderate  Mild
───────────────────────────────────────────────────────
RSV subtype B            2         2         2
Influenza A H3N2         4         3         2
SARS-CoV-2 JN.1          1         3         6

5c. Cases per location

println("Cases by location:")
println("─"^40)
for (loc_id, loc_node) in sort(collect(locations), by=x->x.first)
    # Count cases contained at this location
    count = 0
    for (_, _, _, lid, _, _, _) in case_records
        if lid == loc_id
            count += 1
        end
    end
    label = first(split(loc_node.s, '\n'))
    println("  $(rpad(label, 30)) $count cases")
end
Cases by location:
────────────────────────────────────────
  Sunrise Care Home              4 cases
  Meadowbrook Care Home          2 cases
  General Hospital               7 cases
  Oakfield Primary School        3 cases
  Royal Infirmary                7 cases
  Riverside Academy              2 cases

5d. Most common symptoms

# Count symptom frequency across all case reports using graph links
println("Symptom frequency:")
println("─"^35)

symptom_freq = Dict{String, Int}()
for (_, _, _, _, _, _, symp_str) in case_records
    for s in split(symp_str, ",")
        s = strip(String(s))
        symptom_freq[s] = get(symptom_freq, s, 0) + 1
    end
end

for (s, f) in sort(collect(symptom_freq), by=x -> -x.second)
    bar = "█"^f
    println("  $(rpad(s, 15)) $bar $f")
end
Symptom frequency:
───────────────────────────────────
  cough           ████████████████████████ 24
  fever           ██████████████ 14
  dyspnoea        ████████ 8
  hypoxia         ██████ 6
  fatigue         ██████ 6
  sore throat     █████ 5
  myalgia         █████ 5
  rhinorrhoea     █████ 5
  wheeze          ████ 4
  tachypnoea      ██ 2
  headache        ██ 2

5e. Vulnerable patients with severe outcomes

println("Severe cases with risk factors:")
println("─"^60)
for (id, age_group, risks, vax) in patient_data
    if haskey(patient_severity, id) && patient_severity[id] == "severe" && !isempty(risks)
        pathogen_name = if haskey(patient_pathogen, id)
            patient_pathogen[id].s
        else
            "unknown"
        end
        println("  $id ($age_group) — $pathogen_name, risks: $(join(risks, ", "))")
    end
end
Severe cases with risk factors:
────────────────────────────────────────────────────────────
  P001 (65+) — Influenza A H3N2, risks: age over 65, chronic lung disease
  P005 (65+) — SARS-CoV-2 JN.1, risks: age over 65, cardiovascular disease
  P010 (65+) — Influenza A H3N2, risks: age over 65, immunocompromised
  P014 (65+) — Influenza A H3N2, risks: age over 65, chronic lung disease
  P022 (65+) — Influenza A H3N2, risks: age over 65, immunocompromised

5f. Healthcare worker cases

println("Healthcare worker cases (potential nosocomial transmission):")
println("─"^65)
hcw_ids = [id for (id, _, risks, _) in patient_data if "healthcare worker" in risks]
for hcw in sort(hcw_ids)
    if haskey(patient_onset, hcw)
        pname = patient_pathogen[hcw].s
        # Find location
        loc_name = ""
        for (_, pid, _, lid, _, _, _) in case_records
            if pid == hcw
                loc_name = locations[lid].s
                break
            end
        end
        println("  $(patient_onset[hcw]) | $hcw | $pname | $loc_name")
    end
end
Healthcare worker cases (potential nosocomial transmission):
─────────────────────────────────────────────────────────────────
  2025-01-17 | P003 | Influenza A H3N2 | Royal Infirmary
  2025-01-22 | P012 | SARS-CoV-2 JN.1 | Royal Infirmary
  2025-01-24 | P017 | SARS-CoV-2 JN.1 | Royal Infirmary
  2025-01-26 | P025 | SARS-CoV-2 JN.1 | Royal Infirmary

6. Contact Tracing — Forward Cones and Path Finding

SST’s forward cone is the natural analogue of SPARQL property paths. Starting from an index case, the cone traces all reachable nodes through causal (LEADSTO) arrows — exactly what epidemiological contact tracing needs.

6a. Contact chain from index case P001

# Forward cone from P001 — traces all reachable connections
cone_p001 = forward_cone(store, patient_nodes["P001"].nptr; depth=5)

# Collect all unique patient nodes reachable from P001
reachable = Set{String}()
for path in cone_p001.paths
    for lnk in path
        n = mem_get_node(store, lnk.dst)
        if n !== nothing && n.chap == "patients" && n.s != "P001"
            push!(reachable, n.s)
        end
    end
end

println("Contact/transmission chain from P001 ($(length(reachable)) patients reachable):")
for pid in sort(collect(reachable))
    println("  → $pid")
end
Contact/transmission chain from P001 (3 patients reachable):
  → P002
  → P010
  → P022

6b. Transmission chain — path finding

SST path solving traces specific routes through the graph, revealing the hop-by-hop transmission chain.

# Find paths from P004 (COVID index) to P020 (end of chain)
result = find_paths(store, patient_nodes["P004"].nptr, patient_nodes["P020"].nptr; max_depth=8)

if !isempty(result.paths)
    path_nodes = result.paths[1]  # first DAG path
    println("Transmission chain P004 → P020 ($(length(path_nodes)) nodes, $(length(result.paths)) paths found):")
    for (i, nptr) in enumerate(path_nodes)
        n = mem_get_node(store, nptr)
        if n !== nothing
            label = first(split(n.s, '\n'))
            prefix = i == 1 ? "  " : "  → "
            println("$prefix$label [$(n.chap)]")
        end
    end
else
    println("No DAG path found P004 → P020 ($(length(result.loops)) loop paths)")
end
Transmission chain P004 → P020 (5 nodes, 1 paths found):
  P004 [patients]
  → P009 [patients]
  → P013 [patients]
  → P016 [patients]
  → P020 [patients]

6c. Backward cone — who could have infected P022?

# Backward cone traces incoming arrows — who led to this patient?
cone_back = backward_cone(store, patient_nodes["P022"].nptr; depth=3)

reachable_back = Set{String}()
for path in cone_back.paths
    for lnk in path
        n = mem_get_node(store, lnk.dst)
        if n !== nothing && n.chap == "patients" && n.s != "P022"
            push!(reachable_back, n.s)
        end
    end
end

println("Backward cone from P022 (potential sources):")
for pid in sort(collect(reachable_back))
    println("  ← $pid")
end
Backward cone from P022 (potential sources):
  ← P001
  ← P002

7. Network Analysis — Identifying Super-Spreaders

7a. Adjacency matrix and centrality

# Build adjacency matrix from patient transmission subgraph
adj = SemanticSpacetime.AdjacencyMatrix()

for (src, tgt) in transmissions
    src_ptr = patient_nodes[src].nptr
    tgt_ptr = patient_nodes[tgt].nptr
    SemanticSpacetime.add_edge!(adj, src_ptr, tgt_ptr, 1.0)
end

# Add contact edges with lower weight
for (a, b) in contacts
    a_ptr = patient_nodes[a].nptr
    b_ptr = patient_nodes[b].nptr
    SemanticSpacetime.add_edge!(adj, a_ptr, b_ptr, 0.3)
end

println(graph_summary(adj))
Graph Summary
─────────────
  Nodes:   20
  Links:   18 (directed)
  Sources: 6
  Sinks:   7
  Top centrality:
    (1,40)  0.0500
    (1,21)  0.0500
    (1,27)  0.0500
    (1,26)  0.0500
    (1,42)  0.0500

7b. Eigenvector centrality — who is most connected?

if !isempty(adj.nodes)
    centrality = eigenvector_centrality(adj; max_iter=100, tol=1e-6)
    ranked = sort(collect(centrality), by=x -> x.second, rev=true)

    println("Patient centrality (transmission + contact network):")
    println("─"^45)
    for (nptr, score) in ranked[1:min(10, length(ranked))]
        node = mem_get_node(store, nptr)
        if node !== nothing
            pid = node.s
            sev = get(patient_severity, pid, "?")
            println("  $(rpad(pid, 8)) score=$(rpad(round(score, digits=4), 8)) severity=$sev")
        end
    end
end
Patient centrality (transmission + contact network):
─────────────────────────────────────────────
  P020     score=0.05     severity=mild
  P001     score=0.05     severity=severe
  P007     score=0.05     severity=moderate
  P006     score=0.05     severity=moderate
  P022     score=0.05     severity=severe
  P015     score=0.05     severity=mild
  P010     score=0.05     severity=severe
  P004     score=0.05     severity=mild
  P017     score=0.05     severity=mild
  P003     score=0.05     severity=mild

7c. Sources and sinks — index cases and endpoints

# Build a directed-only adjacency matrix for transmission
adj_tx = SemanticSpacetime.AdjacencyMatrix()
for (src, tgt) in transmissions
    SemanticSpacetime.add_edge!(adj_tx, patient_nodes[src].nptr, patient_nodes[tgt].nptr, 1.0)
end

sources = find_sources(adj_tx)
sinks = find_sinks(adj_tx)

println("Index cases (transmission sources):")
for nptr in sources
    n = mem_get_node(store, nptr)
    n !== nothing && println("  ⚠ $(n.s) — $(get(patient_pathogen, n.s, nothing) !== nothing ? patient_pathogen[n.s].s : "?")")
end

println("\nEndpoint cases (transmission sinks):")
for nptr in sinks
    n = mem_get_node(store, nptr)
    n !== nothing && println("  ○ $(n.s)")
end
Index cases (transmission sources):
  ⚠ P001 — Influenza A H3N2
  ⚠ P003 — Influenza A H3N2
  ⚠ P004 — SARS-CoV-2 JN.1
  ⚠ P006 — RSV subtype B
  ⚠ P011 — RSV subtype B

Endpoint cases (transmission sinks):
  ○ P007
  ○ P010
  ○ P017
  ○ P019
  ○ P020
  ○ P022

8. Risk Inference — SST Contextual Analysis

Instead of Datalog reasoning, SST uses contextual analysis and graph traversal to infer risk categories. The spacetime type system provides a natural framework: LEADSTO arrows carry causal weight, while CONTAINS and EXPRESS arrows provide structural context.

8a. Risk classification

println("Risk Classification")
println("════════════════════")

# High-risk elderly: age 65+ with severe outcome
println("\nHigh-risk elderly (65+ with severe outcome):")
for (id, age_group, risks, vax) in patient_data
    if age_group == "65+" && get(patient_severity, id, "") == "severe"
        println("  ⚠ $id — risks: $(isempty(risks) ? "none" : join(risks, ", "))")
    end
end

# High-risk paediatric: age 0-4 with severe outcome
println("\nHigh-risk paediatric (0-4 with severe outcome):")
for (id, age_group, _, _) in patient_data
    if age_group == "0-4" && get(patient_severity, id, "") == "severe"
        println("  ⚠ $id")
    end
end

# Potential super-spreaders: patients who transmitted to ≥2 others
tx_count = Dict{String, Int}()
for (src, _) in transmissions
    tx_count[src] = get(tx_count, src, 0) + 1
end
println("\nPotential super-spreaders (transmitted to ≥2):")
for (pid, count) in sort(collect(tx_count), by=x -> -x.second)
    if count >= 2
        println("  ⚠ $pid — infected $count others")
    end
end

# HCW-exposed patients
println("\nPatients exposed via healthcare worker contact:")
for (a, b) in contacts
    if "healthcare worker" in [rf for (id, _, rfs, _) in patient_data if id == a for rf in rfs]
        println("  → $b (exposed via HCW $a)")
    end
end
Risk Classification
════════════════════

High-risk elderly (65+ with severe outcome):
  ⚠ P001 — risks: age over 65, chronic lung disease
  ⚠ P005 — risks: age over 65, cardiovascular disease
  ⚠ P010 — risks: age over 65, immunocompromised
  ⚠ P014 — risks: age over 65, chronic lung disease
  ⚠ P022 — risks: age over 65, immunocompromised

High-risk paediatric (0-4 with severe outcome):
  ⚠ P011
  ⚠ P019

Potential super-spreaders (transmitted to ≥2):
  ⚠ P001 — infected 2 others

Patients exposed via healthcare worker contact:
  → P012 (exposed via HCW P003)
  → P017 (exposed via HCW P003)
  → P017 (exposed via HCW P012)
  → P001 (exposed via HCW P003)
  → P024 (exposed via HCW P017)
  → P019 (exposed via HCW P025)

8b. Serial interval analysis

println("Serial Interval Analysis")
println("════════════════════════")
println(rpad("Pair", 16), rpad("Infector", 10), rpad("Infectee", 10),
        rpad("SI (days)", 12), "Class")
println("─"^58)

onset_day = Dict{String, Int}()
for (_, pid, _, _, onset, _, _) in case_records
    onset_day[pid] = parse(Int, split(onset, "-")[3])
end

for (i, (src, tgt)) in enumerate(transmissions)
    if haskey(onset_day, src) && haskey(onset_day, tgt)
        si = onset_day[tgt] - onset_day[src]
        cls = si < 2 ? "short" : (si < 5 ? "typical" : "long")
        println(rpad("pair$i", 16), rpad(src, 10), rpad(tgt, 10),
                rpad(string(si), 12), cls)
    end
end
Serial Interval Analysis
════════════════════════
Pair            Infector  Infectee  SI (days)   Class
──────────────────────────────────────────────────────────
pair1           P001      P002      1           short
pair2           P001      P010      3           typical
pair3           P002      P022      7           long
pair4           P006      P007      1           short
pair5           P004      P009      2           typical
pair6           P009      P013      1           short
pair7           P013      P016      1           short
pair8           P016      P020      1           short
pair9           P011      P019      2           typical
pair10          P003      P017      7           long

8c. Clinical decision support

println("Clinical Decision Support")
println("═════════════════════════")

actions = Dict{String, Vector{String}}()

for (id, age_group, risks, _) in patient_data
    sev = get(patient_severity, id, "")
    n_contacts = count(c -> c[1] == id || c[2] == id, contacts)

    # Isolation required — severe with contacts
    if sev == "severe" && n_contacts >= 1
        push!(get!(actions, "ISOLATION REQUIRED", String[]), id)
    end

    # ICU referral — severe + elderly
    if sev == "severe" && age_group == "65+"
        push!(get!(actions, "ICU REFERRAL", String[]), id)
    end

    # Contact tracing priority — ≥3 contacts
    if n_contacts >= 3
        push!(get!(actions, "CONTACT TRACING PRIORITY", String[]), id)
    end

    # Paediatric escalation — severe + under 5
    if sev == "severe" && age_group == "0-4"
        push!(get!(actions, "PAEDIATRIC ESCALATION", String[]), id)
    end
end

for (action, pids) in sort(collect(actions))
    println("\n  $action:")
    for p in sort(pids)
        println("    → $p")
    end
end
Clinical Decision Support
═════════════════════════

  CONTACT TRACING PRIORITY:
    → P001
    → P002
    → P003
    → P009
    → P017

  ICU REFERRAL:
    → P001
    → P005
    → P010
    → P014
    → P022

  ISOLATION REQUIRED:
    → P001
    → P005
    → P010
    → P011
    → P019
    → P022

  PAEDIATRIC ESCALATION:
    → P011
    → P019

9. Weighted Search — Risk Propagation

SST’s weighted search propagates risk scores through the network, modelling how outbreak risk diffuses through contact chains.

# Weighted search from index case P001
wpaths = weighted_search(store, patient_nodes["P001"].nptr; max_depth=4)
ranked = rank_by_weight(wpaths)

println("Risk propagation from index case P001 ($(length(ranked)) paths found):")
println("─"^55)
seen = Set{String}()
for wp in ranked[1:min(15, length(ranked))]
    # Show the endpoint of each path
    endpoint = last(wp.nodes)
    n = mem_get_node(store, endpoint)
    if n !== nothing && n.chap == "patients" && n.s ∉ seen && n.s != "P001"
        push!(seen, n.s)
        sev = get(patient_severity, n.s, "?")
        println("  $(rpad(n.s, 8)) path_weight=$(rpad(round(wp.total_weight, digits=2), 8)) severity=$sev")
    end
end
Risk propagation from index case P001 (3368 paths found):
───────────────────────────────────────────────────────
  P005     path_weight=4.0      severity=severe
  P014     path_weight=4.0      severity=severe
  P018     path_weight=4.0      severity=moderate
  P022     path_weight=4.0      severity=severe

10. Text Search — Syndromic Surveillance

SST’s text search enables natural-language-style queries, useful for syndromic surveillance where clinicians search for symptom patterns.

# Search for respiratory distress indicators
for query in ["dyspnoea", "hypoxia", "wheeze", "severe"]
    results = mem_search_text(store, query)
    titles = filter(n -> !occursin('\n', n.s) && length(n.s) < 80, results)
    println("Search '$query': $(length(titles)) matches")
    for n in titles[1:min(3, length(titles))]
        println("  '$(n.s)' [$(n.chap)]")
    end
end
Search 'dyspnoea': 1 matches
  'dyspnoea' [symptoms]
Search 'hypoxia': 1 matches
  'hypoxia' [symptoms]
Search 'wheeze': 1 matches
  'wheeze' [symptoms]
Search 'severe': 1 matches
  'severity: severe' [clinical]

11. Epidemic Curve

dates = sort(unique([onset for (_, _, _, _, onset, _, _) in case_records]))
pathogen_short = Dict(flu => "Flu", cov => "CoV", rsv => "RSV")

println("Epidemic Curve (onset dates)")
println("════════════════════════════")
for d in dates
    day_cases = [(pid, p) for (_, pid, p, _, onset, _, _) in case_records if onset == d]
    markers = join([pathogen_short[p] for (_, p) in day_cases], " ")
    bar = "█"^length(day_cases)
    println("  $d | $bar $(length(day_cases)) ($markers)")
end
println("\nLegend: Flu=Influenza, CoV=SARS-CoV-2, RSV=RSV-B")
Epidemic Curve (onset dates)
════════════════════════════
  2025-01-14 | █ 1 (RSV)
  2025-01-15 | ██ 2 (Flu RSV)
  2025-01-16 | █ 1 (Flu)
  2025-01-17 | ██ 2 (Flu RSV)
  2025-01-18 | ██ 2 (Flu RSV)
  2025-01-19 | ██ 2 (Flu RSV)
  2025-01-20 | ███ 3 (CoV Flu RSV)
  2025-01-21 | ██ 2 (CoV Flu)
  2025-01-22 | ███ 3 (CoV CoV Flu)
  2025-01-23 | ██ 2 (CoV Flu)
  2025-01-24 | ██ 2 (CoV CoV)
  2025-01-25 | ██ 2 (CoV CoV)
  2025-01-26 | █ 1 (CoV)

Legend: Flu=Influenza, CoV=SARS-CoV-2, RSV=RSV-B

12. Location–Pathogen Heatmap

loc_names_sorted = sort(collect(keys(locations)))
pathogen_list = [("Influenza", flu), ("SARS-CoV-2", cov), ("RSV-B", rsv)]

println(rpad("Location", 28), join([rpad(p, 14) for (p, _) in pathogen_list]))
println("─"^(28 + 14 * length(pathogen_list)))

for loc_id in loc_names_sorted
    loc_name = locations[loc_id].s
    print(rpad(loc_name[1:min(27, length(loc_name))], 28))
    for (_, pnode) in pathogen_list
        n = count(r -> r[3] === pnode && r[4] == loc_id, case_records)
        cell = n == 0 ? "  ·" : "  $(repeat("█", n)) $n"
        print(rpad(cell, 14))
    end
    println()
end
Location                    Influenza     SARS-CoV-2    RSV-B         
──────────────────────────────────────────────────────────────────────
Sunrise Care Home             ████ 4        ·             ·           
Meadowbrook Care Home         ██ 2          ·             ·           
General Hospital              █ 1           ██████ 6      ·           
Oakfield Primary School       ·             ·             ███ 3       
Royal Infirmary               █ 1           ████ 4        ██ 2        
Riverside Academy             █ 1           ·             █ 1

13. Chapter Distribution

SST chapters provide a natural typing system. The distribution of nodes across chapters reveals the knowledge graph’s structure.

chap_counts = Dict{String,Int}()
for (_, node) in store.nodes
    chap_counts[node.chap] = get(chap_counts, node.chap, 0) + 1
end

println("Chapter distribution:")
for (chap, count) in sort(collect(chap_counts), by=x -> -x.second)
    bar = "█"^min(count, 40)
    println("  $(rpad(chap, 25)) $bar $count")
end
Chapter distribution:
  case reports              █████████████████████████ 25
  patients                  █████████████████████████ 25
  timeline                  █████████████ 13
  symptoms                  ███████████ 11
  risk factors              ████████ 8
  locations                 ██████ 6
  demographics              █████ 5
  pathogen taxonomy         █████ 5
  pathogens                 ███ 3
  location types            ███ 3
  pathogen properties       ███ 3
  clinical                  ███ 3
  outbreaks                 ███ 3
  interventions             ██ 2

14. Graph Statistics

# Full graph adjacency matrix
adj_full = SemanticSpacetime.AdjacencyMatrix()
for (nptr, node) in store.nodes
    for links in node.incidence
        for link in links
            SemanticSpacetime.add_edge!(adj_full, nptr, link.dst, Float64(link.wgt))
        end
    end
end

println(graph_summary(adj_full))
Graph Summary
─────────────
  Nodes:   115
  Links:   678 (directed)
  Sources: 0
  Sinks:   0
  Top centrality:
    (1,8)  1.0000
    (1,7)  0.6577
    (3,22)  0.4876
    (3,37)  0.4785
    (3,1)  0.4667

Summary

This vignette demonstrated how SemanticSpacetime.jl naturally models epidemiological knowledge using its spacetime type system:

SST FeatureEpidemiological Application
ChaptersEntity typing: patients, pathogens, locations, symptoms
LEADSTO arrows (then)Causal transmission chains, temporal ordering
CONTAINS arrows (contain, has-pt)Location containment, symptom membership, outbreak grouping
EXPRESS arrows (note)Metadata: onset dates, severity, vaccination status
NEAR arrows (ll)Contact tracing (spatial proximity without causation)
Forward/backward conesContact tracing chains (analogue of SPARQL property paths)
Path findingTransmission chain reconstruction
Eigenvector centralitySuper-spreader identification
Weighted searchRisk propagation through the network
Text searchSyndromic surveillance queries
Adjacency analysisSources (index cases), sinks (endpoints), degree distribution

SST vs RDF for Epidemiology

The SST approach differs from RDF in several key ways:

  • No separate ontology language: The arrow type system (LEADSTO, CONTAINS, EXPRESS, NEAR) provides built-in semantics without needing OWL declarations.
  • Spacetime semantics: The distinction between causal arrows (LEADSTO) and proximity arrows (NEAR) maps directly to the epidemiological distinction between transmission (causal) and contact (spatial).
  • Graph-native queries: Instead of SPARQL, SST uses forward/backward cones, weighted search, and path finding — operations optimised for knowledge graph traversal.
  • Chapters as types: Chapter membership provides lightweight typing without the overhead of class hierarchies.

Key epidemiological findings from the scenario:

  • Influenza A H3N2 clustered in care homes (4 cases at Sunrise Care Home alone), with severe outcomes among elderly patients with comorbidities.
  • SARS-CoV-2 JN.1 formed a nosocomial chain at General Hospital (P004→P009→P013→P016→P020) amplified by healthcare worker contacts.
  • RSV-B primarily affected children in school settings, with two severe paediatric cases requiring hospitalisation.
  • Network centrality identified P001 and P009 as key nodes in the transmission network, consistent with their roles as index cases in the care home and hospital clusters respectively.