Hall D Internal Note

HDDM - HALL D Data Model

draft 1.0

Richard Jones
May 25, 2001

Fig. 1: The conceptual data model for Hall D begins with a physics event, coming either from the detector or a Monte Carlo program, which evolves increasing structure as it flows through the analysis pipeline. The data model specifies the elements of information that are contained in a view of the event at each stage and the relationships between them. The implementation provides standard methods for creating, storing and accessing the information.

General Notes:

  1. At each stage (lower-case items in diagram) in the pipeline one has a unique view of the event.
  2. To each of these is associated a unique data model that expresses the event in that view.
  3. In Hall D we use xml documents to give a logical description of the data model in terms of a hierarchical structure. This structure is described using xml documents.
  4. The xml data model document is used for two purposes: to provide a template for formatting a complete event in xml, and generate code to support the data model in a programming environment.
  5. To actually use the data model, it has to be expressed in some language we all can use. Right now that seems to be c for most people, so the first tools to be written focus on expressing the data model in terms of c structures defined in a c header file.
  6. A tool called hddm-c automatically constructs a set of c-structures from the xml data model that can be used to express the data in memory. It also provides an i/o interface for piping the structures between applications or saving them to disk. The stream is essentially just the xml with redundancy suppressed, and it contains all of the necessary xml documentation for unpacking the data on the other end.
  7. Tools called stdhep2hddm and hddm2stdhep provide conversion between the hddm data stream and the stdhep format used by HDFast.
  8. Tools called hddm-xml and xml-hddm provide conversion between the hddm data stream and plain-text xml documents. The hddm-xml tool can read any hddm data file written by anyone, without recompilation and without help from outside documents because sufficient documentation is provided in the stream header to completely construct the file in xml.

Rules for constructing data model documents:

  1. Data model documents must be well-formed xml, with the top tag of the document being <HDDM>. Because one is free to chose any tag name to describe the data, there is no dtd. Verification that a document conforms to the rules set out below is provided with the hddm i/o and converters toolkit.
  2. A tag of a given name is written only once at a given layer in the hierarchy of the hddm document. If the element may be repeated more than once in a given context, a special repeat attribute should be specified within the tag. For example, <tag repeat="*"> allows any number of repetitions of the element <tag> in its context.
  3. If a given tag is repeated within a context then the order of those repetitions is significant in the model, and preserved during i/o. The order of dissimilar tags at a given layer is not significant; listing them in any order in the hddm document is equivalent.
  4. If a given tag appears more than once anywhere in the data model then the list of attributes and content elements must agree in all instances.
  5. All elements in the model document are either empty or contain other elements. Any numeric or textual content between the open and close tags is treated as a comment and ignored.
  6. Any datum which takes on a simple value can be expressed in the document model as attributeName="type" where attributeName should be some xml attribute identifier name that is descriptive of the datum.
  7. All quantities in the data model are carried by named attributes of elements. The rest of the document exists to express the meaning of the data and the relationships between them.
  8. The generic model requires a datum described by an attribute to always be present. A datum which may sometimes be absent can be described in the model by an embedded element bearing the attribute.
  9. The "type" of an attribute is restricted to a small set of scalar c types. Currently "int", "float" and "double" are supported. Some common enum types are also recognized, including "bool" and "Particle_t". New types can be added, but it involves modification of the hddm i/o and converters code.
  10. Any unrecognized "type" values in the data model document are treated as constants and are simply carried around with the tag, but do not appear as variables in the memory representation and do not take up space in the binary stream. One use for them is as a mechanism to attach version information to the data model document.
  11. Two data models with any differences in the tags structure or list of attributes are considered different. The text areas between open and close tags are ignored and comments or written documentation (eg. extra tags that have been commented out) may be inserted/removed there without changing the data model.
  12. Two different data models are considered to be compatible with each other if there are no collisions between them, i.e. tags with the same name but different attribute lists or contents. Two models are declared to be compatible by assigning them the same value for the "class" argument to the <HDDM> tag.
  13. The data model supports a simple inheritance mechanism through the "class" attribute of the document. Using the standard xml include file mechanism, a hierarchy of increasingly complex data models can be built up from simple components that are common to all models in the class. See examples below for details.
  14. In the case where a single program needs simultaneous access to data from two data sources with incompatible data models, the collision is avoided by assigning them different values for the "class" attribute to the <HDDM> tag.

Implementation Notes:

  1. The binary file format will change. The point is not to fixate on some absolute binary format at this early stage. The only design constraint was that it adhere closely to the xml and be readily converted into plain-text xml without auxiliary files or access to the source code of the program that wrote it.
  2. The design has been optimized for flexibility: the user can request only the part of the model that is of interest. The entire model does not even have to be present in the file, in which case only the parts of the tree that are present in the file are loaded into memory.
  3. The only constraint between the model used in the program and that of the hddm stream is that there be no collisions, that is tags found in both but with different attributes or contents.
  4. Two data models with colliding definitions can be used in one program but they have to have different class="X" IDs. Two streams with different class IDs cannot feed into each other. In any case the xml viewing tool hddm-xml can read a hddm stream of any class.

Examples:

  1. A simple model of an event fragment describing hits in a time-of-flight wall. It allows for multiple hits per detector in a single event, with t and dE information for each hit. The hits are ordered by side (right: end=0, left: end=1) and then by horizontal slab. The repeat="*" attributes allow those tags to appear any number of times, or not at all, in the given context.
    <forwardTOF>
      <slab y="float" repeat="*">
        <side end="int" repeat="*">
          <hit t="float" dE="float" repeat="*" />
        </side>
      </slab>
    </forwardTOF>
    
  2. A model of the output from an event generator. An example of actual output from genr8 converted to xml using hddm-xml. Warning: netscape has difficulty displaying plain xml. Internet explorer gives a nice view of the document.
    <?xml version="1.0"?>
    
    <HDDM class="s" version="1.0">
      <physicsEvent eventNo="int" runNo="int">
        <reaction type="int" weight="float" repeat="*">
          <beam type="Particle_t">
            <momentum px="float" py="float" pz="float" E="float" />
            <properties charge="int" mass="float" />
          </beam>
          <target type="Particle_t">
            <momentum px="float" py="float" pz="float" E="float" />
            <properties charge="int" mass="float" />
          </target>
          <vertex repeat="*">
            <product type="Particle_t" decayVertex="int" repeat="*">
              <momentum px="float" py="float" pz="float" E="float" />
              <properties charge="int" mass="float" />
            </product>
            <origin vx="float" vy="float" vz="float" t="float" />
          </vertex>
        </reaction>
      </physicsEvent>
    </HDDM>
    
  3. A more complex example showing a hits tree for the full detector. An example is coming soon showing output from the Geant simulation in this view.
    <?xml version="1.0"?>
    
    <HDDM class="s" version="1.0">
      <physicsEvent eventNo="int" runNo="int">
    
        <hitView version="1.0">
          <barrelDC>
            <cathodeCyl radius="float" repeat="*">
              <strip sector="int" z="float" repeat="*">
                <hit t="float" dE="float" repeat="*" />
              </strip>
            </cathodeCyl>
            <ring radius="float" repeat="*">
              <straw phim="float" repeat="*">
                <hit t="float" dE="float" repeat="*" />
                <point z="float" dEdx="float" phi="float"
                            dradius="float" repeat="*" />
              </straw>
            </ring>
          </barrelDC>
        
          <forwardDC>
            <package pack="int" repeat="*">
              <chamber module="int" repeat="*">
                <cathodePlane layer="int" u="float" repeat="*">
                  <hit t="float" dE="float" repeat="*"/>
                  <cross v="float" z="float" tau="float" repeat="*" />
                </cathodePlane>
              </chamber>
            </package>
          </forwardDC>
        
          <startCntr>
            <sector sector="float" repeat="*">
              <hit t="float" dE="float" repeat="*" />
            </sector>
          </startCntr>
        
          <barrelCal>
            <module sector="float" repeat="*">
              <flash t="float" pe="float" repeat="*" />
            </module>
          </barrelCal>
            
          <Cerenkov>
            <module sector="float" repeat="*">
              <flash t="float" pe="float" repeat="*" />
            </module>
          </Cerenkov>
        
          <forwardTOF>
            <slab y="float" repeat="*">
              <side end="int" repeat="*">
                <hit t="float" dE="float" repeat="*" />
              </side>
            </slab>
          </forwardTOF>
        
          <forwardEMcal>
            <row row="int" repeat="*">
              <column col="int" repeat="*">
                <flash t="float" pe="float" repeat="*" />
              </column>
            </row>
          </forwardEMcal>
        </hitView>
      </physicsEvent>
    </HDDM>
    

This material is based upon work supported by the National Science Foundation under Grant No. 0303512.