Supported file formats


AG format

The AG format is an XML representation of annotation graph data model, and the official format for annotation graphs.

  • Format name: ATLAS Interchange Format level 0 or Annotation Graph format
  • AGLIB format name: AG
  • Input/Output: yes/yes
  • Resources: DTD, Annotation Graph Toolkit
  • Programming interface
    Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:AG
    filename:name of the file to load
    id:optional, ignored
    sigInfo:optional, ignored
    options:optional
    Option Posible values Description
    dtd A path or URL to the DTD Specifies the location of the DTD. If not present, the system id in DOCTYPE declaration in the document is used by default.
    encoding ASCII, UTF-8, UTF-16, UCS4, IBM037, IBM1140, ISO-8859-1, Windows-1252 Force the loader to treat the document as if it's in the specified encoding. By default, the encoding specified in XML declaration is used. If none are given, defaults to UTF-8.
    DTDvalidation true, false Tells the loader whether or not to perform a DTD validation. Default value is true.


    Store(format, filename, id, options)

    returns an empty string
    format:AG
    filename:name of the file to load
    id:AGId or AGSetId to be stored
    options:optional
    Option Posible values Description
    dtd A path or URL to the DTD Specifies the location of the DTD in the output file. Default value is http://agtk.sf.net/doc/xml/ag-1.1.dtd
    encoding Any encoding allowed for XML Writes the encoding in the XML declaration of output file. If not given, no encoding will be written in the XML declaration.

    ATLAS format

  • Format name: ATLAS Interchange Format level 1
  • AGLIB format name: ATLAS
  • Input/Output: yes/yes
  • Resources: jATLAS homepage, AIF DTD, MAIA
  • Programming interface
    Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:ATLAS
    filename:name of the file to load
    id:optional AGSetId. If not given or an empty string is given, ATLAS is assumed.
    sigInfo:optional, ignored
    options:optional
    Option Posible values Description
    MAIA A path to the MAIA file Specifies the location of the MAIA file. If not given, the value specified in the ATLAS file will be used.
    agann ATLAS annotation type This type of ATLAS Annotations and their sub Annotations will become AG's. The higher level Annotations will be put in an extra AG. If not given, the top ATLAS Annotation type in the hierarchy will be used.
    DTDvalidation true, false Tells the loader whether or not to perform a DTD validation. The DTD validation process includes a validation of the ATLAS file and a validation of the MAIA file. The default value is true.


    Store(format, filename, id, options)

    returns an empty string
    format:ATLAS
    filename:name of the file to load
    id:AGId or AGSetId to be stored
    options:optional
    Option Posible values Description
    encoding Any encoding name allowed in XML Writes the given encoding string in the XML declaration of the output file. If not given, the output won't have encoding specification in the XML declaration.
    schemeLocation A path of URL to the MAIA file The value is for the schemeLocation attribute of Corpus tag. If not given, AGSet's _AtlasSchemeLocation_ feature is used. If the feature is not available or invalid, the value of MAIA option is used. The specified file may not exist.
    MAIA A path of URL to the MAIA file The MAIA file on which the annotation graphs to be stored are based. If not given, the value of schemeLocation is used. The file should actually exist. If not, store process can proceed and an exception is thrown.
    Corpus/type Corpus type as defined in MAIA file The corpus type that the annotation graphs are emulating. If not given, AGSet's _AtlasCorpusType_ feature will be used. If it's not available of invalid, an exception is thrown.
    Corpus/id Corpus id The corpus id that will be written as the value of id attribute of Corpus element in the output. If not given, AGSet's _AtlasCorpusID_ feature is used. If it's not available or invalid. AGSetId is used. If it's not valid again, an exception is thrown.
    type_rename AG Annotation type and ATLAS Annotation type pairs If there is mismatch beteen AG Annotation type name and ATLAS Annotation type name, this option can be used to map the different names.
    AnaIdMap Analysis type, role and id triples There is a mapping between Analysis type&role and Analysis id in an ATLAS file. Using this option, you can substitute the default mapping which is established when an ATLAS file loaded.
    DTDvalidation true, false Tells the function whether or not to do DTD validation of MAIA file. Default value is true.


    Store2(format, filename, ids, options)

    returns an empty string
    format:ATLAS
    filename:name of the file to load
    ids:A list of AGIds to be stored
    options:See options of Store()


    Annotation graph model of ATLAS format

    The Atlas format extends annotation graph data model in various ways. Most of all, it generalizes the concept of anchor. Anchor is not just a time mark in Atlas. It is a point in a multi-dimensional space. Region revives in Atlas. It specifies a portion in a signal to be annotated. Region can be expressed in a very complex way, using anchors, annotations and even regions.

    By restricting the target signals to linear ones, it is still possible to model Atlas format as annotation graph. The restriction coded in ATLAS class is as follows:

    • Anchor has one and only one parameter of numeric type.
    • Region has exactly two anchor references. The role of one anchor reference must be start and the other must be end. start anchor specifies the start offset and end anchor specifies the end offset.

    Atlas also extends the content model of annotation of annotation graph, and this also needs to be restricted.

    • The content must be a list of parameters. Role of a parameter is mapped to annotation feature name. The text content of parameter is mapped to annotation feature value.

    Atlas makes the hierarchy of annotations explicit be specifying the annotation's children. In annotation graph, such hierarchy is implicit and incomplete. To solve this problem, Atlas ag model introduces a special annotation feature, _AtlasAnnChil_. There are other special features, and they are summarized below.

    • _AtlasAnnChil_ : Annotation feature. List of child annotations, e.g.
      token token ATLAS:AG1:Annotation1 ATLAS:AG1:Annotation2 ATLAS:AG1:Annotation3 ; token interruptionPoint ATLAS:AG1:Annotation3
      This example means the annotation has 2 children group. The first group contains token type children and their role is token. The second group contains token type children and their role is interruptionPoint.

      Important: Applications should carefully manupulate the _AtlasAnnChil_ feature. The value of that feature is critical in generating Atlas output.

    • _AtlasAnnID_ : Annotation feature. Atlas allows external references to annotation. Thus, it's important to keep original ids during conversion. This feature keeps the original id.
    • _AtlasRegID_ : Annotation feature. Atlas region id.
    • _AtlasStartAncID_ : Annotation feature. Atlas start anchor id.
    • _AtlasEndAncID_ : Annotation feature. Atlas end anchor id.
    • _AtlasCorpusID_ : AGSet feature (metadata). Atlas corpus id.
    • _AtlasSchemeLocation_ : AGSet feature (metadata). The location of MAIA file.
    • _AtlasAnaMapStr_ : AGSet feature (metadata). Records original Atlas analysis ids.
    • _AtlasSignalID_ : Signal feature (metadata). Atlas signal id.
    • _AtlasCorpusType_ : AGSet feature (metadata). Atlas corpus type.

    BAS format

  • Format name: BAS (Barvarian Archive for Speech Signals) Partitur format, v1.2.5
  • AGLIB format name: BAS
  • Input/Output: yes/no
  • Resources: Format documentation
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:BAS
    filename:name of the file to load
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional, ignored

    BU format

  • Format name: Boston University Radio Speech Corpus format
  • AGLIB format name: BU
  • Input/Output: yes/no
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:BU
    filename:Common prefix of annotation file set to load
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional,
    Option Posible values Description
    base lbl If there exist both .lba and .lbl file, the loader uses .lba file for base annotations by default. Use this option to force the loader to choose .lbl file. If only one of them exists, there is no choice and the option is ignored.

    CAG format

    The CAG format has been developed to reduce the size of annotation files while keeping the equivalence with the AG format. The CAG format could be an alternative of the AG format when the storage space matters.

    This format can be stored over several files. Dictionary is a part of the format that contains anchors. Annotation is the part that contains annotations. The anchors in Dictionary are referenced by annotations in the Annotation part. An Annotation file should specify the Dictionary file to include in the header. The header is processed by the loader to include appropriate Dictionary files. The Annotation part also can be spread out by annotation types. For example, each type of annotations can be stored in a separate file. Those Annotation files can be included and referenced by other Annotation files.

  • Format name: Compact Annotation Graph format
  • AGLIB format name: CAG
  • Input/Output: yes/yes
  • Resources: BNF style format definition
  • Programming interface
    Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:CAG
    filename:name of the file to load
    id:optional, ignored
    sigInfo:optional, ignored
    options:optional
    Option Posible values Description
    compress true, false Tells whether the input file should be gzip-compressed or not. Default value is false. Note that this option is effective only when the zlib is compiled in the plugin.


    Store(format, filename, id, options)

    returns an empty string
    format:CAG
    filename:output filename. Used only when the both dictionary and annotation options are not set.
    id:AGId or AGSetId to be stored
    options:optional
    Option Posible values Description
    compress true, false Tells whether of not to gzip-compress the output. Default value is false. Note that this option is effective only when the zlib is compiled in the plugin.
    dictionary Dictionary file name The Dictionary part will be stored in this file.
    annotation Annotatoin file name The Annotation part will be stored in this file.
    include List of dictionary file names (space seperated string) These files are written in the Annotation file header in the include section.
    types List of annotation types (space seperated string) If specified, only these types of annotations are stored in the annotation file. By default, all the annotations are stored.

    The following combination of options are possible:
    dictionary annotatoin include  
    oo? Both Dictionary and Annotation are created. include option is not necessary, since the value of the dictionary option will be used. But if given, they will be added in the header of the Annotation file.
    ox? Only Dictionary file is created. include option is ignored.
    xoo Only Annotation file is created. include option should be set.
    xx? One combo file is created with the filename given in the function arguments. include option is ignored.



    Store2(format, filename, ids, options)

    returns an empty string

    Same as Store(), except that Store2 accepts list of AGIds for ids.


    LCF format

  • Format name: LDC Callhome Format
  • AGLIB format name: LCF
  • Input/Output: yes/yes
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:LCF
    filename:name of the file to load
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional, ignored


    Store(format, filename, id, options)

    returns an empty string
    format:LCF
    filename:output filename
    id:AGId or AGSetId to be stored
    options:optional, ignored


    Store2(format, filename, ids, options)

    returns an empty string

    Same as Store(), except that Store2 accepts list of AGIds for ids.


    SwitchBoard format

  • Format name: Switchboard corpus format
  • AGLIB format name: SwitchBoard
  • Input/Output: yes/no
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:SwitchBoard
    filename:name of the file to load
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional, ignored

    TF format

    The TF format is composed of records. Each record is composed of fields that are seperated by a delimiter. The first field is assumed to be the start time, and the second field, end time. The rest are features for the annotation that are identified by the start/end time. TF loader/writer can be used for TDF and CSV formats.

  • Format name: Table Format
  • AGLIB format name: TF
  • Input/Output: yes/yes
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:TF
    filename:name of the file to load
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:Required
    Option Posible values Description
    seprator a string This string is used as the field delimiter.
    header list of field names delimited by the seperator This is used to determine the annotation feature names.
    ann_type a string with no white space in it Annotations' type will be set to this value. TF is assumed by default.


    Store(format, filename, id, options)

    returns an empty string
    format:TF
    filename:output filename
    id:AGId or AGSetId to be stored
    options:Required
    Option Posible values Description
    seprator a string This string is used as the field delimiter for writing.
    header list of feature names delimited by the seperator This is needed to format the output. Only the features in the header will be stored in the output. The header also determines the order of fields.


    Store2(format, filename, ids, options)

    returns an empty string

    Same as Store(), except that Store2 accepts list of AGIds for ids.


    TIMIT format

  • Format name: TIMIT corpus format
  • AGLIB format name: TIMIT
  • Input/Output: yes/no
  • Resources: TIMIT Corpus
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:TIMIT
    filename:The common prefix of annotation file set to load.
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional, ignored

    TreeBank format

  • Format name: Penn Treebank format
  • AGLIB format name: TreeBank
  • Input/Output: yes/yes
  • Resources: Penn Treebank project homepage
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:TreeBank
    filename: (1) The file name to load.
    (2) Treebank string, if input type option is used.
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional,
    Option Posible values Description
    input type "string" (without quotes) Tells the filename argument is not a filename, but a string in the TreeBank format.


    Store(format, filename, id, options)

    returns an empty string
    format:TreeBank
    filename:output filename
    id:AGId or AGSetId to be stored
    options:optional, ignored


    Store2(format, filename, ids, options)

    returns an empty string

    Same as Store(), except that Store2 accepts list of AGIds for ids.


    XLabel format

  • Format name: xlabel format
  • AGLIB format name: XLabel
  • Input/Output: yes/no
  • Corpus
  • Programming interface:
  • Load(format, filename, id, sigInfo, options)

    returns a list of AGIds loaded
    format:XLabel
    filename:The common prefix of annotation file set to load.
    id:AGId or AGSetId. If the object by the id doesn't exist, the loader will create one.
    sigInfo:optional. See Load in AGAPI doc for the value.
    options:optional, ignored

    Annotation Graph Toolkit