Text Analysis API

Text to N4L

SemanticSpacetime.TextSignificanceType
TextSignificance

Tracks word/n-gram frequencies and last-seen positions for computing intentionality scores during text analysis.

SemanticSpacetime.score_sentenceFunction
score_sentence(text::String, vocab::Dict{String,Int}) -> Float64

Compute an intentionality score for a sentence given a vocabulary of word frequencies. Higher scores indicate more distinctive/meaningful sentences.

Uses a simplified static intentionality model: words that appear a moderate number of times (not too rare, not too common) in the corpus score highest.

SemanticSpacetime.extract_significant_sentencesFunction
extract_significant_sentences(text::String; target_percent::Float64=50.0) -> Vector{String}

Extract the most significant sentences from a text, combining both running and static intentionality analysis. Returns approximately target_percent of the original sentences (merged from both methods).

SemanticSpacetime.text_to_n4lFunction
text_to_n4l(text::String; chapter::String="", target_percent::Float64=50.0) -> String

Convert plain text to N4L format output. Extracts the most significant sentences and formats them as N4L notation with chapter structure, sequence markers, and extract relationships.

Returns the N4L formatted string.

N-gram Fractionation

SemanticSpacetime.reset_ngram_state!Function
reset_ngram_state!()

Initialize (or reset) the module-level n-gram frequency, location, and last-seen tracking maps. Each is a Vector of NGRAMMAX Dicts.

SemanticSpacetime.new_ngram_mapFunction
new_ngram_map() -> Vector{Dict{String,Float64}}

Create a fresh n-gram frequency map (vector of NGRAMMAX empty dicts).

SemanticSpacetime.split_into_para_sentencesFunction
split_into_para_sentences(text::AbstractString) -> Vector{Vector{Vector{String}}}

Split text into paragraphs → sentences → fragments. Returns a 3-level nested structure: paragraphs of sentences of fragment strings.

SemanticSpacetime.split_punctuation_textFunction
split_punctuation_text(s::AbstractString) -> Vector{String}

Split text on intentional separators (quotes, dashes, colons, etc.), respecting balanced parentheses.

SemanticSpacetime.un_parenFunction
un_paren(s::AbstractString) -> Tuple{String, Bool}

If s is wrapped in matching brackets/parens, return the inner content and true. Otherwise return trimmed s and false.

SemanticSpacetime.count_parensFunction
count_parens(s::AbstractString) -> Vector{String}

Split text respecting balanced parentheses/brackets/braces. Returns fragments where parenthesized groups are kept intact.

SemanticSpacetime.excluded_by_bindingsFunction
excluded_by_bindings(firstword::AbstractString, lastword::AbstractString) -> Bool

Check if an n-gram starts or ends with binding words that promise to connect to adjacent content, making the n-gram a poor standalone fragment.

SemanticSpacetime.fractionateFunction
fractionate(frag::AbstractString, L::Int, frequency::Vector{Dict{String,Float64}}, min_n::Int) -> Vector{Vector{String}}

Extract n-grams from a text fragment using a round-robin buffer. Returns changeset where changeset[n] contains n-grams of length n.

SemanticSpacetime.next_wordFunction
next_word(word::AbstractString, rrbuffer::Vector{Vector{String}}) -> (Vector{Vector{String}}, Vector{Vector{String}})

Process one word through the n-gram round-robin buffer. Returns (updated rrbuffer, changeset) where changeset[n] contains the new n-grams formed at each length n.

SemanticSpacetime.fractionate_textFunction
fractionate_text(text::AbstractString) -> (Vector{Vector{Vector{String}}}, Int)

Clean, split, and build n-gram frequency/location maps from text. Updates the module-level STMNGRAMFREQ and STMNGRAMLOCA state. Returns (paragraphs, sentence_count).

SemanticSpacetime.fractionate_text_fileFunction
fractionate_text_file(filename::AbstractString) -> (Vector{Vector{Vector{String}}}, Int)

Read file, clean, split, and build n-gram frequency/location maps. Returns (paragraphs, sentence_count).

N-gram Intentionality

SemanticSpacetime.ngram_static_intentionalityFunction
ngram_static_intentionality(L::Int, s::AbstractString, freq::Float64) -> Float64

Compute the static significance of an n-gram string s within a document of L sentences. Intentionality = work / probability, using exponential deprecation based on SST cognitive scales (Dunbar numbers).

SemanticSpacetime.assess_static_intentFunction
assess_static_intent(frag::AbstractString, L::Int, frequency::Vector{Dict{String,Float64}}, min_n::Int) -> Float64

Score a fragment by static intentionality using the n-gram round-robin buffer.

SemanticSpacetime.intentional_ngramFunction
intentional_ngram(n::Int, ngram::AbstractString, L::Int, coherence_length::Int) -> Bool

Determine if an n-gram is intentional (anomalous) vs ambient (repeated regular pattern). Unigrams are never intentional. Short documents are all intentional. For longer documents, checks if the distribution of inter-occurrence spacings is broad.

SemanticSpacetime.interval_radiusFunction
interval_radius(n::Int, ngram::AbstractString) -> (Int, Int, Int)

Find the minimax distances between occurrences of an n-gram (in sentences). Returns (occurrences, mindelta, maxdelta).

SemanticSpacetime.assess_static_text_anomaliesFunction
assess_static_text_anomalies(L::Int, frequencies, locations)

Split text n-grams into intentional (anomalous) vs ambient (contextual) parts. Returns (intent, context) — both Vector{Vector{TextRank}} of size NGRAMMAX.

SemanticSpacetime.assess_text_coherent_coactivationFunction
assess_text_coherent_coactivation(L::Int, ngram_loc)

Global coherence analysis — separate n-grams into those that overlap across coherence intervals (ambient) and those unique to a single interval (condensate). Returns (overlap, condensate, partitions).

SemanticSpacetime.assess_text_fast_slowFunction
assess_text_fast_slow(L::Int, ngram_loc)

Running fast/slow separation by coherence intervals. For each pair of adjacent intervals, n-grams shared between them are "slow" (context), and those unique to one are "fast" (intentional). Returns (slow, fast, partitions).

SemanticSpacetime.coherence_setFunction
coherence_set(ngram_loc, L::Int, coherence_length::Int)

Partition n-grams into coherence sets based on their occurrence locations. Returns (C, partitions) where C[n][p] is a Dict{String,Int} for n-gram size n and partition p.

SemanticSpacetime.extract_intentional_tokensFunction
extract_intentional_tokens(L::Int, selected::Vector{TextRank}, nmin::Int, nmax::Int)

Extract fast/slow parts per partition and whole-document summaries. Returns (fastparts, slowparts, fastwhole, slowwhole).

Context Intelligence

SemanticSpacetime.context_intent_analysisFunction
context_intent_analysis(spectrum::Dict{String,Int}, clusters::Vector{String})

Separate intentional (low frequency < 3) from ambient fragments. Returns (intentional::Vector{String}, ambient::Vector{String}).

SemanticSpacetime.update_stm_contextFunction
update_stm_context(store::AbstractSSTStore, ambient::String, key::String, now::Int64, params)::String

Update STM context from search parameters. Extracts tokens from the params and delegates to add_context.

SemanticSpacetime.add_contextFunction
add_context(store::AbstractSSTStore, ambient::String, key::String, now::Int64, tokens::Vector{String})::String

Add tokens to STM tracking, prune forgotten entries, and return the combined context string of all active STM fragments.

SemanticSpacetime.commit_context_token!Function
commit_context_token!(token::AbstractString, now::Int64, key::AbstractString)

Track a token in STM. If previously seen, moves it from intentional to ambient.

SemanticSpacetime.intersect_context_partsFunction
intersect_context_parts(context_clusters::Vector{String})

Compute pairwise overlap between context clusters. Returns (count, uniqueclusters, overlapmatrix).

SemanticSpacetime.diff_clustersFunction
diff_clusters(l1::AbstractString, l2::AbstractString)

Return (shared, different) parts of two comma-separated context strings.

SemanticSpacetime.overlap_matrixFunction
overlap_matrix(m1::Dict{String,Int}, m2::Dict{String,Int})

Return (sharedstring, differentstring) representing overlap and unique parts of two token frequency maps.

SemanticSpacetime.get_node_contextFunction
get_node_context(store::AbstractSSTStore, node::Node) -> Vector{String}

Get the context strings attached to a node via the empty arrow ghost link. Returns a vector of context labels parsed from the comma-separated context string.

SemanticSpacetime.get_node_context_stringFunction
get_node_context_string(store::AbstractSSTStore, node::Node) -> String

Get the context string from a node's ghost link (empty arrow, LEADSTO type). The context is stored as an incoming link with the "empty" arrow.

SemanticSpacetime.context_interferometryFunction
context_interferometry(context_clusters::Vector{String}) -> Nothing

Deprecated/placeholder. In the original Go source this function was deleted (body replaced with // deleted). Retained here as a no-op stub for API compatibility. Use intersect_context_parts and diff_clusters instead.

Time Semantics

SemanticSpacetime.do_nowtFunction
do_nowt(then::DateTime) -> Tuple{String, String}

Convert a DateTime to a semantic time representation. Returns (whendescription, timekey) where:

  • when_description is a human-readable semantic time string
  • time_key is a compact key for database/context use
SemanticSpacetime.seasonFunction
season(month::AbstractString) -> Tuple{String, String}

Returns (northernhemisphereseason, southernhemisphereseason) for the given month name.

SemanticSpacetime.get_time_from_semanticsFunction
get_time_from_semantics(speclist::Vector{String}, now::DateTime) -> DateTime

Parse a semantic time specification (Day3, Hr14, Mon, etc.) into a DateTime. The first element of speclist is ignored (it's typically a command prefix).

Log Analysis

SemanticSpacetime.parse_csv_logFunction
parse_csv_log(text::String; delimiter::Char=',', header::Bool=true) -> Vector{NamedTuple}

Parse CSV/TSV log data. First row is treated as header when header=true.

SemanticSpacetime.parse_log_to_sst!Function
parse_log_to_sst!(store::MemoryStore, text::String;
                  config::LogParseConfig=default_log_config()) -> Dict{String,Int}

Parse log text and create SST nodes for each log entry.

  • Each entry becomes a node in the configured chapter
  • Sequential entries are linked via LEADSTO (then) if link_sequential=true
  • Log level, source, and extracted patterns become EXPRESS (note) annotations
  • Timestamp becomes a timeline annotation

Returns statistics: entries_parsed, nodes_created, links_created.

Text Breakdown

SemanticSpacetime.identify_entitiesFunction
identify_entities(text::String) -> Vector{EntitySuggestion}

Extract named entities and concepts from text using heuristic NLP patterns.

SemanticSpacetime.suggest_linksFunction
suggest_links(text::String, entities::Vector{EntitySuggestion}) -> Vector{LinkSuggestion}

Given extracted entities, suggest SST relationships based on textual proximity and verb/preposition patterns.

SemanticSpacetime.propose_structureFunction
propose_structure(text::String; chapter::String="default") -> TextBreakdown

Full text analysis: extract entities, suggest links, generate N4L.