> Topic Maps > HG4TM

HG4TM - Hypergraph for Topic Maps

Updated - November 21, 2002
Author : Bernard Vatant

Overview

This document is an attempt to express the Reference Model for Topic Maps (RM4TM) using Graph Theory. It follows a previous proposal fully developed in a paper presented by Pascal Auillans, Patrice Ossona de Mendez, Pierre Rosenstiehl and Bernard Vatant at ISWC 2002 conference. A gentle introduction to this previous model can be found here : HyperGraphTM.

The model proposed here tries to be as simple as possible, and has get rid of some features of the previous proposal such as "lift" and "semantic layers", that did not seem conformant to the Reference Model philosophy. To meet the requirements both of RM4TM and of hypergraph structure integrity, the subjects and the hypergraph are defined as distinct sets, and a representation mapping is defined between the two sets. Every object in the hypergraph represents at most one subject, but some subjects can be represented by two objects in the hypergraph.

Notes:
1. Current version does not explicitly meet all RM4TM requirements.
2. More prose and illustrations are certainly needed for clarity ...

1. The Topic Map Hypergraph

A Topic Map Hypergraph is a triple G = (S, H, r) where

  • 1.1. S is a set called the "set of subjects"

RM4TM requirement of "Subject Location Uniqueness" is achieved in S.
Each element of S is a single subject, each subject in the Topic Map is an element of S.

  • 1.2. H is an hypergraph called "hypergraph representation" of S

The hypergraph is used to represent the structure of assertions.
Note that "Subject Location Uniqueness" is not achieved in the hypergraph, since the same subject can be represented by two elements of the hypergraph, for example by a vertex in some place, and by an edge in another one (for an assertion).

For clarity of the model, S and H will be considered as disjoint sets.
A formal definition of an hypergraph is given below.
See also a less formal introduction, with illustration and example

An hypergraph is a 5-uple H = ( V , lV , E , lE , I ) where:
  • V, E, I are disjoint sets, respectively the vertex set, the edge set and the incidence set of the hypergraph.
  • The vertex-connector lV is a mapping from V to the set P(I) of all the subsets of I (which means that the image of a vertex is a subset of I), such that lV (V) is a partition of I (this means that two different vertices are mapped to disjoint subsets of I, and that the union of all the images of the vertices in V is equal to I itself).
  • The edge-connector lE is a mapping from E to the set P(I), such that lE(E) is a partition of I
Put it otherwise : for every incidence i, there exists exactly
- one vertex v such that i belongs to lV(v)
- one edge e such that i belongs to lE(e)
The vertex v and the edge e connected by i are called incident to each other.
  • 1.3. r is a function called "representation function"
    Domain of
    r
    is a part of H. Range of r is S.
    If s is a subject in S, and x an element of H, s = r(x) reads "s is represented by x" or "x represents s"

Remarks

  • Each subject in S is represented by at least one element in H.

A subject may be represented by two elements of H, following the rules set in section 2 below.

  • Each element of H (vertex, incidence, or edge) represents at most one subject in S.

The integrity of the hypergraph structure implies that each incidence has to be connected to exactly one vertex and one edge, but RM4TM allows casting without role player. In such a case, the hypergraph will contain a "blank" vertex corresponding to the empty slot, (this vertex will represent no subject).

2. Rules of representation

Let V, E, and I the vertex set, edge set and incidence set of H, respectively.

2.1. A subject is represented by at most one vertex (element of V)

That means two distinct vertices can't represent the same subject.
But a subject represented by an edge or an incidence may or may not be represented by a vertex (see below).

2.2. The restriction of r to E U I is a bijection

Let A = r(E). An element of A is called an assertion.
Let C = r(I). An element of C is called a role-casting.

A consequence of 2.2 is that C and A are disjoint, since E and I are disjoint.

Consequences of r being a bijection on E U I

  • Every edge represents exactly one assertion, and distinct edges represent distinct assertions.
  • An assertion is represented by exactly one edge, and distinct assertions are represented by distinct edges.
  • Every incidence represents exactly one role-casting, and distinct incidences represent distinct role-casting.
  • A role-casting is represented by exactly one incidence, and distinct role-casting are represented by distinct incidences.

To sum it up simply ...

An assertion is represented by exactly one edge, and at most one vertex.
A role-casting is represented by exactly one incidence, and at most one vertex.
Other subjects are represented by at most one vertex.

The above structure represents only the Cx and CA arcs of RM4TM.
AT and CR arcs are represented as simple relations in S - see following section.

4. Type

4.1. Definition

Type is a binary relation T defined on S. Notation : T(x, y)

Read "The type of x is y" or "x is instanceOf y" or "x isA y"

Remarks
1.
T(x,y) can be considered as an assertion (with class and instance role type), and represented as such in the hypergraph. But this relation has to be generic, to avoid recursivity issues (typing class and instance role type).
2. Type applies to assertions, role-casting and other subjects, with specific constraints (see below).
The properties of Type match the properties of AT and CR arcs in RM4TM.

4.2. Cardinality

The following table gives the cardinality of the type, in conformance to RM4TM

x
Assertion
Role-casting
Other Subject
Type of x
0 or 1
1
Unbounded

4.3. Assertion Types and Role Types

  • An assertion type is a subject t such as for some assertion a, T(a,t)
  • A role type is a subject r such as for some role x, T(x,r)

Let T and R the sets of assertion types and role types respectively.
To meet RM4TM requirements A, C, T and R are disjoint from each other.

EXAMPLE
Note that the assertion a1 is represented twice, by the edge e1, and by the vertex v3

5. Subject Properties

5.1. Definition
A property is a function p of which domain is a part of S.
Range of p is called the set of p-values.

The model does not constrain the nature of the set of p-values. They can be elements of S or not.

5.2. Identifying properties
A property which is injective (one-to-one) on its domain is called an identifying property
.

  • p is an an identifying property if and only if two distinct subjects have distinct p-values.
  • If p is an an identifying property, p-values are identifiers of subjects.

6. Merging

Given two Topic Map Hypergraphs G1 = (S1, H1, r1) and G2 = (S2, H2, r2)
G1 and G2 are merged into G = (S, H, r) , under the following conditions.

m is a function from S1 U H1 U S2 U H2 to S U H, called "merging function"

For all:
s1 element of S1 and x1 element of H1, such as s1 = r1(x1)
s2 element of S2 and x2 element of H2, such as s2 = r2(x2)
p1 identifying property defined on S1
p2 identifying property defined on S2

If p1(s1) = p2(s2) then:
m
(s1) = m(s2) = s
m(x1) = m(x2) = x
s =
r(x)

Remarks:
1. To apply the conditions of merging, one need to have a common range of p-values for p1
and p2.
2. Having the same p-value for some identifying property is a sufficient condition for merging, but not a necessary one, allowing extra merging to be based on properties external to the two original TMH.
G1 and G2 are merged into G ... but G is not uniquely defined by the merging.

 
© 2002 Mondeca - univers immedia