Building Knowledge Graphs
Simon Frost
Introduction
A knowledge graph in Semantic Spacetime is more than a bag of triples — it is a structured, typed representation of a domain organized by chapters (topics) and contexts (tags). This vignette demonstrates how to build a realistic knowledge graph modelling an infectious disease system.
Setup
using SemanticSpacetime
SemanticSpacetime.reset_arrows!()
SemanticSpacetime.reset_contexts!()
# Register a comprehensive set of arrows
# LEADSTO
then_f = insert_arrow!("LEADSTO", "then", "leads to", "+")
then_b = insert_arrow!("LEADSTO", "prev", "preceded by", "-")
insert_inverse_arrow!(then_f, then_b)
causes_f = insert_arrow!("LEADSTO", "causes", "causes", "+")
causes_b = insert_arrow!("LEADSTO", "caused-by", "caused by", "-")
insert_inverse_arrow!(causes_f, causes_b)
affects_f = insert_arrow!("LEADSTO", "affects", "affects", "+")
affects_b = insert_arrow!("LEADSTO", "affected-by", "is affected by", "-")
insert_inverse_arrow!(affects_f, affects_b)
# CONTAINS
has_f = insert_arrow!("CONTAINS", "has", "contains", "+")
has_b = insert_arrow!("CONTAINS", "in", "is in", "-")
insert_inverse_arrow!(has_f, has_b)
part_f = insert_arrow!("CONTAINS", "has-part", "has part", "+")
part_b = insert_arrow!("CONTAINS", "part-of", "is part of", "-")
insert_inverse_arrow!(part_f, part_b)
# EXPRESS
note_arr = insert_arrow!("EXPRESS", "note", "has note", "+")
eg_arr = insert_arrow!("EXPRESS", "e.g.", "has example", "+")
name_arr = insert_arrow!("EXPRESS", "name", "has name", "+")
# NEAR
like_arr = insert_arrow!("NEAR", "like", "is similar to", "+")
store = MemoryStore()MemoryStore
Nodes: 0
Links: 0
Chapters:Chapters as Topic Boundaries
Chapters organize the graph into thematic sections. A node belongs to exactly one chapter, but edges can cross chapter boundaries. Think of chapters as sections in a notebook.
# Chapter: pathogen — describe the virus
virus = mem_vertex!(store, "SARS-CoV-2", "pathogen")
spike = mem_vertex!(store, "spike protein", "pathogen")
rna = mem_vertex!(store, "RNA genome", "pathogen")
mem_edge!(store, virus, "has-part", spike)
mem_edge!(store, virus, "has-part", rna)
mem_edge!(store, virus, "name", mem_vertex!(store, "severe acute respiratory syndrome coronavirus 2", "pathogen"))
# Chapter: transmission — how the disease spreads
droplets = mem_vertex!(store, "respiratory droplets", "transmission")
aerosol = mem_vertex!(store, "aerosol transmission", "transmission")
contact = mem_vertex!(store, "close contact", "transmission")
mem_edge!(store, contact, "causes", mem_vertex!(store, "exposure", "transmission"), ["airborne"])
mem_edge!(store, droplets, "like", aerosol)
# Chapter: disease — clinical progression
infection = mem_vertex!(store, "infection", "disease")
incubation = mem_vertex!(store, "incubation period", "disease")
symptoms = mem_vertex!(store, "symptom onset", "disease")
fever = mem_vertex!(store, "fever", "disease")
cough = mem_vertex!(store, "cough", "disease")
recovery = mem_vertex!(store, "recovery", "disease")
severe = mem_vertex!(store, "severe disease", "disease")
# Temporal chain of disease progression
mem_edge!(store, infection, "then", incubation)
mem_edge!(store, incubation, "then", symptoms)
mem_edge!(store, symptoms, "then", recovery)
mem_edge!(store, symptoms, "then", severe)
# Symptoms as parts of the disease
mem_edge!(store, symptoms, "has", fever)
mem_edge!(store, symptoms, "has", cough)
# Chapter: intervention — public health responses
vaccine = mem_vertex!(store, "mRNA vaccine", "intervention")
mask = mem_vertex!(store, "face mask", "intervention")
distancing = mem_vertex!(store, "social distancing", "intervention")
println("Graph: $(node_count(store)) nodes, $(link_count(store)) links")
println("Chapters: ", join(mem_get_chapters(store), ", "))Graph: 18 nodes, 20 links
Chapters: disease, intervention, pathogen, transmissionCross-Chapter Edges
Edges can link nodes across chapters, capturing how different aspects of a domain interact:
# The virus causes infection (pathogen → disease)
mem_edge!(store, virus, "causes", infection, ["epidemiology"])
# Spike protein enables transmission (pathogen → transmission)
mem_edge!(store, spike, "causes", mem_vertex!(store, "cell entry", "transmission"), ["mechanism"])
# Vaccines affect the virus (intervention → pathogen)
mem_edge!(store, vaccine, "affects", spike, ["immunology"])
# Masks reduce transmission (intervention → transmission)
mem_edge!(store, mask, "affects", droplets, ["prevention"])
mem_edge!(store, distancing, "affects", contact, ["prevention"])
println("Cross-chapter links created")
println("Updated: $(node_count(store)) nodes, $(link_count(store)) links")Cross-chapter links created
Updated: 19 nodes, 30 linksUsing Contexts
Contexts are metadata tags attached to edges that describe the perspective or domain of a relationship. They enable filtered views of the same graph.
# Add nodes with contextual edges
r0 = mem_vertex!(store, "basic reproduction number", "epidemiology")
mem_edge!(store, r0, "note",
mem_vertex!(store, "R0 ≈ 2-3 for original strain", "epidemiology"),
["statistics", "modelling"])
mem_edge!(store, r0, "note",
mem_vertex!(store, "R0 influenced by contact patterns", "epidemiology"),
["behaviour", "modelling"])
mem_edge!(store, r0, "affects", severe,
["public-health", "modelling"])
println("Contextual edges added")Contextual edges addedIdempotent Operations
A key feature of SST is that mem_vertex! is idempotent — adding the same node twice returns the existing one. This simplifies graph construction since you don’t need to track whether a node already exists.
v1 = mem_vertex!(store, "SARS-CoV-2", "pathogen")
v2 = mem_vertex!(store, "SARS-CoV-2", "pathogen")
println("Same node? $(v1.nptr == v2.nptr)")
println("Node count unchanged: $(node_count(store))")Same node? true
Node count unchanged: 22Inspecting the Graph
Let’s examine the structure we’ve built:
# Show node distribution across chapters
for chap in sort(mem_get_chapters(store))
count = 0
for (_, node) in store.nodes
if node.chap == chap
count += 1
end
end
println(" $chap: $count nodes")
end disease: 7 nodes
epidemiology: 3 nodes
intervention: 3 nodes
pathogen: 4 nodes
transmission: 5 nodes# Show the incidence structure of the virus node
channel_names = ["-EXPRESS", "-CONTAINS", "-LEADSTO", "NEAR",
"+LEADSTO", "+CONTAINS", "+EXPRESS"]
println("Links from 'SARS-CoV-2':")
for (i, links) in enumerate(virus.incidence)
for link in links
dst = mem_get_node(store, link.dst)
arr = get_arrow_by_ptr(link.arr)
if dst !== nothing
println(" [$(channel_names[i])] —($(arr.short))→ '$(dst.s)'")
end
end
endLinks from 'SARS-CoV-2':
[+LEADSTO] —(causes)→ 'infection'
[+CONTAINS] —(has-part)→ 'spike protein'
[+CONTAINS] —(has-part)→ 'RNA genome'
[+EXPRESS] —(name)→ 'severe acute respiratory syndrome coronavirus 2'Building with Annotations
EXPRESS edges are particularly useful for adding metadata, examples, and notes to any node:
# Annotate the vaccine node
mem_edge!(store, vaccine, "note",
mem_vertex!(store, "Pfizer-BioNTech BNT162b2", "intervention"))
mem_edge!(store, vaccine, "note",
mem_vertex!(store, "Moderna mRNA-1273", "intervention"))
mem_edge!(store, vaccine, "e.g.",
mem_vertex!(store, "two-dose regimen with 21-day interval", "intervention"))
println("Final graph: $(node_count(store)) nodes, $(link_count(store)) links")Final graph: 25 nodes, 37 linksSummary
Key patterns for building SST knowledge graphs:
- Organize by chapter — group related nodes into thematic sections
- Use typed arrows — choose the right STType for each relationship
- Add context — tag edges with topic keywords for filtered views
- Cross-reference — connect nodes across chapters to capture cross-domain relationships
- Annotate freely — use EXPRESS edges for notes, examples, and metadata