The Science of EDXML

Below, we will evaluate EDXML as a knowledge representation (KR) and point out key similarities and differences between it and other knowledge representation techniques.

Origin

Knowledge Representation (KR) is a field within Artificial Intelligence that aims to represent knowledge in a machine readable way. This enables machines to use it for reasoning and assist humans in decision-making by supporting their reasoning. Contrary to many Machine Learning techniques, the reasoning process can be fully transparent. This enables humans to interpret and verify its results.

EDXML was not conceived in the context of KR research. It was shaped into its current form by operational challenges in forensics, law enforcement, intelligence and cybersecurity. Hence its focus on simplicity and practical applicability rather than scientific advancement.

For those familiar with the field of KR it may be useful to know how to relate EDXML to other approaches that have been described in literature.

Emergence

Knowledge representations usually take the form of a knowledge base, such as a collection of frames or a semantic network. The knowledge base is extracted from data or manually crafted. Knowledge in EDXML is emergent. Rather than representing the knowledge itself, it represents the original data in the form of events while indicating how each event evokes bits of knowledge. These knowledge fragments can be combined to form a true knowledge base. In a way, an EDXML event type can be regarded as a template for the knowledge contained in its events.

EDXML has some resemblance with the WordNet project. WordNet defines how to translate natural language text into knowledge, where text phrases evoke bits of knowledge. EDXML enables translating 'machine languages' (JSON, database records, ...) into knowledge, where events evoke bits of knowledge. The resemblance also touches on the story telling analogy where EDXML events are like the paragraphs in a novel.

Theoretically, when data can evoke knowledge, the inverse is also be possible: Knowledge that evokes data. Given a corpus of knowledge one could synthesize EDXML events which evoke that knowledge. The generated data would have the same structural consistency as real data and, as a result, would look familiar to humans. This data could be used in systems testing or for training human operators by replacing real EDXML data sources with sources that generate data based on a knowledge corpus. In the story telling analogy this would be the equivalent of improvising a new story based on a given set of character descriptions. The knowledge corpus is similar to a character sheet which is used by novelists to weave consistent story lines into a novel.

This form of data synthesis is a current subject of active research.

Data First

Most knowledge representations are specifically designed to represent knowledge in the form of a knowledge base. This knowledge base is the subject of study rather than the original data from which the knowledge was acquired. An EDXML document is more like a database than a knowledge base. It attempts to represent the original data as accurately as possible, retaining the context in which the knowledge exists. This property is highly appreciated in forensics, where analysts need to trace back a specific analysis result all the way back to the original evidence.

One could say that EDXML is a data first approach rather than a knowledge first approach. The difference between both approaches translates into making different choices. The most prominent one is perhaps the emergent nature of the knowledge in EDXML. Another choice is to target inference engines, not advanced semantic reasoners.

Semantic reasoners require a carefully redacted knowledge base that is logically consistent. Due to the emergent nature of knowledge in EDXML there is no guarantee that the knowledge is logically consistent, because the original data may be inconsistent, inaccurate, incomplete or even incorrect. As such, reasoning engines may not produce useful results. This issue is typically resolved by making adjustments to the knowledge base. However, given the emergent knowledge in EDXML, that would require changing the original data to make the reasoner happy. Especially in fields like forensics this is not acceptable. Also, it would require a skilled knowledge engineer to prepare the data before it can be used.

Inference engines are simpler and more robust than semantic reasoners. Targeting inference engines over semantic reasoners makes EDXML accessible and allows data to be studied as-is. EDXML aims to assist humans in their associative reasoning rather than replacing human reasoning entirely, combining machine power with human judgement.

The data-first approach makes EDXML a pragmatic knowledge representation, focusing on simplicity and practical applicability.

Modular Ontologies

EDXML ontologies are modular. Each EDXML data source produces event type definitions that are specific to that particular source. These event types may be defined in terms of object type definitions and concept definitions that are shared between multiple data sources. The shared object types and concepts form a common vocabulary for multiple data sources, which ensures that the data sources produce knowledge that can be correlated.

This method of ontology modularization is similar to the method of double articulation described by Mustafa Jarrar et al. which introduced the distinction between domain axiomatizations and application axiomatizations.

other subjects

SDK