NIEM was designed to share information across domains (communities of interest or lines of business). You can certainly consider both internal and external data requirements, but you definitely should identify data requirments for sharing information with communities and organizations outside your own. Ask yourself who you share your information with outside your domain? Also, who from outside your domain shares information that you need?
It helps to identify or develop simple scenarios, and within those scenarios identify common use cases for sharing information. It also helps to examine existing database schemes, data dictionaries, XML schemas, flat files, paper/electronic forms, workflows, etc. for data requirements. Such data sources can provide insights into what data is currently shared and how.
There are likely many variances in data names and definitions that already exist in the sources. To create a good domain model it is necessary to harmonize, i.e., decide on a single name, definition, and structure (type) for each data element; eliminate duplication. Then map the data model elements (and types) back to their authoritative sources (data dictionaries, database schemes, forms, etc.) and record this mapping for reference. This mapping will likely become an critical resource to programmers who will implement information exchanges with the domain model and may have to trace back to the legacy data sources.
Model data components for real information requirements that are known to exist or that you know are necessary and are based on actual information scharing scenarios or use cases.
Do not create NIEM data components for every possible contingency or likelihood. Do not create components that might be nice-to-have or that are “likely” future requirements. That said, this does not mean you shouldn’t model new data requirements that are definitely in near term plans.
Note that it is important to consider real information exchange scenarios and associated use cases that will identify both the existing and new near term requirements. If possible, envision what the domain should look like in the (not too distant) future and build the to-be model from the as-is baseline.
Scale back the effort rather than create data components that may have to be deleted or changed in later release cycles, and will subsequently confuse your domain community. Try to ensure the key domain classes (object types) are present in the model, but it is not necessary to be absolutely complete. It is easy to add properties to a type later (in a release or domain update). Furthermore, NIEM types are easily extended with additional elements in IEPDs using augmentation points. IEPD extensions confirm the need for new requirements, and feed them back to the reference model for future additions to NIEM.
Grow your domain model over time as you build NIEM experience. NIEM allows you to publish a domain update anytime outside of the annual release cycle. So, there is always time to catch up and never a need to rush for NIEM’s sake.
Do not overbuild data components. Keep them simple: A type represents a real world object or concept. Elements describe the characteristics or parts of that object or concept. A complex type has elements, elements are typed, and so on down to primitive simple elements (of type string, text, name, date, amount, or token, etc.). For example:
All NIEM elements are defined as complexTypes that are extensions or ancestors of a complex base type in structures.xsd
. These base types contain a simple object attribute group that supports NIEM built-in capabilities such as metadata and referencing.
Code lists also contain the simple object attribute group; however, code lists get that attribute group somewhat differently from elements. Code lists require both a complex type and a simple type. Each code element is defined by an associated CodeType
(for example, EyeColorCode
is defined by EyeColorCodeType
). This CodeType
was derived from a CodeSimpleType
(following the example, EyeColorCodeSimpleType
). The CodeSimpleType
contains the code values as XML enumerations, while the CodeType
extends the CodeSimpleType
by adding the simple object attribute group, and the CodeType
becomes an XML complexType with simple content.
xml:lang
is to an element of type TextType
.with the Core (or with other domains if inputting a domain update).
XML Schema document (XSD) or Change Request (XLS) format.
A scalable vocabulary that will be used by many different communities to exchange information must be understandable to all parties involved. To facilitate consistency and understanding NIEM has established rules for naming and defining its data components. These rules apply to all types, elements, and attributes. They were derived from ISO/IEC Standard 11179, Information Technology – Metadata Registries (MDR). This standard has been around since the 1990’s and continues to be updated. For this reason, please do not expect that NIEM rules for names and definitions are exactly synchronous with Standard 11179. Yet, for the most part, the NIEM Naming and Design Rules (NDR) still generally follow 11179 rules and guidance for designing metadata definitions and names.
Each NIEM element, attribute, and type must be clearly defined before it will be accepted for a NIEM domain update or release. ISO/IEC Standard 11179 Part 4 is the guidance upon which NIEM definitions are formulated. The salient points of that guidance and the NIEM NDR rules are repeated here:
Each data component definition must be unique from all others and distinguishable in meaning. No two definitions can be identical in wording or so close in meaning that they could refer to the same data component.
Try to keep definitions simple and straight forward. This is not always possible, at least make them understandable to others who are not a part of your community of interest (i.e., domain).
Element definitions almost always begin with an indefinite article (i.e., “a” or “an”), never a definite article (i.e., “the”).
Since it is often the case that a type and an element of that type can be defined with identical or similar words (for example, Person and PersonType), it is a NIEM best practice to begin a type definition with the phrase “A data type for …” This ensures that the definition for the element and its associated type are easily distinguishable.
If you have trouble designing a good definition for a data component, refer to the current NIEM release for examples.
Avoid using the terms in the name of a data component to define it. That said, a good data component name may be self-defining. If there are no good synonyms to employee in the definition, and you must use one or more terms from the name, it is not an error.
Aside from the typing implications of opening phrases mentioned above, do not put data typing information in a data component definition. An example of a bad definition is: SocialSecurityID
- “A 9-digit number with hyphens that identifies a person in the U.S.” The fact this element is a “9-digit number with hyphens” should not be included in its definition. In most cases separators are meaningless and unnecessary visual aids for human readability. The correct way to define this identifier is to use the xs:pattern
attribute to restrict it to nine digits and without hyphens. Note that non-alphanumeric characters may be used within identifiers if they are an integral part of the identifier itself (for example, passwords), and are NOT merely being used as visual separators.
Based on the foregoing, it is good practice to avoid use of the word “type” within definitions, because in most cases,”type” refers to data typing. Instead, in the appropriate cases, a definition should use terms such as kinds, class, category, nature, genre, or form to refer to classifications (another relatively common word-sense of “type”).
Code list type definitions — A code list must have a definition for both its associated CodeType
and CodeSimpleType
. Both of these datatypes can have the same definition (one of the few exceptions to the unique definition rule) since they are semantically the same type. The difference is that the complex type extends the simple type to add several common properties that are part of the NIEM infrastructure. The definition should NOT refer to the code values or the code literals. For example, the definition for DayOfWeekCodeSimpleType
could be “A code for a day in a week”. It should NOT include Su=Sunday, Mo=Monday, etc. These values and associated literals will be recorded in the xs:enumeration
elements within the XSD for the CodeSimpleType
.
Type definitions should describe what a type is, not list and define its contents. Describe it as an object not a conatainer of attributes. For example, VehicleType
:
VehicleType
— (bad definition) A data type that contains the following (properties or characteristics) VehicleColorInteriorText, VehicleDoorQuantity, VehicleIdentification, VehicleMake, VehicleModel, …VehicleType
— (good definition) A data type for a means of ground transportation designed to carry an operator, passengers, and/or cargo.Neoskizzle
:
Neoskizzle
container containing many references to elements as defined in NeoskizzleType
.Neoskizzle
contains identification information as well as many other characteristics. See type definition.Neoskizzle
containing necessary data elements.Neoskizzle
related information.At this point, do you have any idea what “neoskizzle” means or is by reading any one of the definitions above? Of course not. So, here is an example of a good definition: A person who takes part in an event, activity, meeting, or other social function. Apparently, a synonym for “neoskizzle” is “participant”. So, why not use the term “Participant” for the element name? (By the way, in case you hadn’t guessed, there is no such word as “neoskizzle”. It’s made up.)
The NIEM NDR provides fairly clear rules and guidance regarding the naming of data components. The most important NDR sections for understanding NIEM data naming are:
The syntax of NIEM data names comes from IEC/ISO 11179 Part 5.
A single NIEM data name may consist of a number of terms. A term is a meaningful word, an abbreviation for a word, or an acronym. Word terms are one of:
ID
, the authorized abbreviation for Identifier
.URI
, the authorized abbreviation for Uniform Resource Identifier
.In accordance with IEC/ISO 11179 Part 5, terms that make up a NIEM data name are classified into four basic parts according to their placement and function. A term in a data name can be one of these:
Example: VehicleTrafficControlDeviceCategoryCode
— A data type for a kind of traffic control device (TCD) applicable to this motor vehicle at the crash location.
Vehicle
= object termTraffic
, Control
, Device
= all qualifier terms modifying Category
Category
= property termCode
= representation termAvoid use of the term Type
(except as a representation term to identify the name of a NIEM type). This is reserved for the representation term indicating data typing; instead use Category
. Because it is an extremely common concept in all domains, the only current exception is BloodType
.
Avoid use of the term Number
in data names. Its use is usually too generic to be meaningful or helpful, unless a name is extremely common across all domains (not just a few). For example, the data name TelephoneNumber
is common across all domains; in this case, it is most useful for clear meaning. Otherwise, a number should usually be an ID (Identifier), Quantity, Numeric (or Value), Amount, Measure, or Duration (of time) (See: https://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/niem-ndr-3.0.html#section_10.8.7).
Do NOT use double terms (i.e., consecutive identical terms such TypeType or NameName) unless such a term has very specific meaning. Double terms should be replaced with a single instance of the term (as long as it does not detract from the real meaning).
ID (Identifier) vs. Identification — ID
and Identification
elements are easy to confuse. An ID
is a string element that uniquely identifies an entity; so, an ID
has simple content. An Identification
element is a set of subelements. For example, an Identification
element for a person usually has subelements such as PersonName
, PersonHeight
, PersonWeight
, PersonEyeColor
, PersonHairColor
IssueDate
, ExpirationDate
, etc.; so, an Identification
element has complex content (i.e., subelements). Usually one or more of its subelements will be ID
elements.
Text(Type)
vs. Name(Type)
— In the construction of NIEM element names, Name
and Text
are authorized representation terms (of type NameType
and TextType
respecively). The term Name
is a word or phrase that constitutes the distinctive designation of and applies to a specific person, place, thing or concept. This is not necessarily an identifier, for example, there are multiple persons with the name “Bob”. Text
is a word or phrase in some language (usually English). A
Date(Type)
— The only date and time format supported by W3C XML Schema is a subset of ISO 8601. NIEM uses this for date and time.
Indicator(Type)
— This term designates NIEM Booleans whose valid values are TRUE or FALSE. Do NOT change these values to YES/NO or 1/0. If required, translation to and from TRUE or FALSE is not difficult.
Best examples of data names are in the current NIEM releases, in particular NIEM Core niem-core.xsd v3.0
, or the Core from the most current major release.