Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] Proposal to enhance the behaviour of a DDLm "Set"category: please consider
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 25 May 2016 09:42:25 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:date:message-id:subject:from:to;bh=aJ8s7IKVcyYsTwZYtEVV4ey9hHPbVO0si93pZUScREo=;b=0tuzSY87x8BrFuy5szH4jHL3x6zHxchtGobbYGpm4DkW6SlSukxQK3kbxPbg0kQF5md5PnaMJF42rnM4PXUVfDAW3Yjpo4lupN2eFpw1Waqfay2sX5SDx7PZ5Aqr8HftjHYp4yDJrfkXOsUqBf/MDDNSJCHpCgnoSWyhdS4MEdRMyv8CKOt1/JoQH7PCE+cezZUzFUEH9PvyccIQgrPf6dxEnyg7var3YMnDn1iJf7vCeSjg7X9zpQJxldp2DD39XQcJxBe8HNsqla8E8vKk1YCGo86TcCgBOZv9peFpcAGp23z2mloK0gP60OA/oVHo3XDhZuIuuKtjKwtE7woLjg==
Dear DDLm group,

Please find below a proposal to add additional behaviour to DDLm Set categories.  The "Background" section provides some of the motivation for this.  In a nutshell, the proposal creates a mechanism which would allow normally single-valued datanames to take multiple values (i.e. become looped) but only within tightly specified conditions. 

James.


Draft proposal to adjust meaning of 'Set' Categories
====================================================

Version: 1  Date: 2016-05-23

Summary
=======

It is proposed that the text accompanying the description of a 'Set'
category in the DDLm attribute dictionary (in the definition of
_definition.class) is changed as follows:

Old text:

 Set          
;                 Category of items that form a set (but not a
                  loopable list). These items may be referenced
                  as a class of items in a dREL methods expression.
;       

New text:

 Set          
;                 Category of items that are usually not looped.  Items from this
                  category may only be looped if the following conditions hold:
                  (1) A category key is defined
                  (2) All other datanames appearing in the same datablock are taken from
                  categories that:
                      (i) Include a dataname with a name.linked_item_id that refers directly or
                      indirectly to the category key defined in (1)
                      (ii) Include the dataname (i) in the _category_key.name loop
;       

Background
==========

There have been persistent requests over the years to re-use
notionally single-valued datanames in contexts that would allow
multiple values. For example, although datablocks containing CIF
structural descriptions expect a single space group, an application
that wished to tabulate space groups together with symmetry operators
and transformations requires multiple space groups to appear in the
data block.

Simply allowing a previously single-valued dataname to optionally take
multiple values causes the meaning of those datanames whose
interpretation has implicitly depended on the assumption of a single,
overall value to become ambiguous.  For example, fractional
coordinates and reflection hkl are calculated and interpreted relative
to a particular space group and set of cell parameters. As soon as
multiple space groups are available, an unambiguous interpretation of
these items is impossible.  Therefore, DDL1 dictionaries have never
expanded to allow looping of previously single-valued datanames.

In apparent contrast, all categories in DDL2 are notionally loopable
and are provided with a category key.  In order to reproduce the DDL1
behaviour, at the domain dictionary level a dataname is defined
("entry.id" for mmCIF) that identifies the datablock and is
constrained by the definition to have a single value. All categories
that should only have single-valued datanames are given a category key
that is a child of this dataname. For example, it is not possible to
provide multiple space groups in a single datablock using mmCIF
datanames, as the symmetry category has a key that points to entry.id,
and is thus constrained to a single value.

At the present time the DDL1 core dictionary is being translated to
DDLm, and we have promised that datablocks written according to the
old DDL1 dictionaries will continue to be interpreted in exactly the
same way after application of aliases found in the new DDLm
dictionaries.  At the same time, we seek to integrate the DDL2
symmetry dictionary into the core dictionary, because both msCIF and
the draft magCIF dictionaries build on datanames defined within it.
As a DDL2 dictionary, the symmetry dictionary defined a looped space
group category, although the msCIF and magCIF uses of it assume a
single overall space group.  See below under 'legacy issues' for
further discussion of this.

Requirements
============

Any change in the single-valuedness of a dataname must meet the
following practical requirements:

(1) The interpretation of existing datablocks must not change (after
transformation of datanames according to aliases)
(2) Existing software must either fail or correctly interpret
datablocks written according to the new standard.

We immediately conclude that any datanames whose interpretation relies
on an overall value for some particular dataname *may not appear* in
datablocks that have instead multiple values for that dataname.
Otherwise, a pre-existing program may read in the values of these
dependent datanames, unaware that they are to be interpreted in
conjunction with a particular value of the newly looped dataname,
leading to implementation-dependent failure or, in the worst case,
incorrect results (for example, generation of too many
symmetry-equivalent atomic positions).

Evaluation of proposed change against the requirements
======================================================

(1) The interpretation of already existing datablocks is not changed
by the above modifications.  As no category key historically existed
for categories with non-looped datanames, condition (2) in the new
definition cannot be met (as no datanames pointing to the non-existent
category key could have existed) and so all datanames are interpreted
as for the old DDL1 scheme.

(2) Existing software expects a single value for the previously
unlooped dataname. When confronted with multiple values, it will
either fail (which is acceptable) or choose a particular value. As the
remainder of the datanames in the datablock did not exist when the
existing software was written, the software will not be able to
proceed to perform any calculations or retrieve information that is
liable to misinterpretation.  The one use case for which
misinterpretation is possible is that in which only information from
the single, newly-looped category is sought, for example, collection
of space-group statistics.

Legacy issues
=============

The symmetry dictionary includes two categories, space_group_symop and
space_group_Wyckoff, that include category keys that point to the
overall space_group category key and therefore meet the requirements
of section (2) of the new definition.  It is thus possible to produce
datafiles containing multiple space groups and symmetry operator lists
that may be misinterpreted by existing software if, as proposed,
space_group_symop datanames are aliased to the symmetry_equiv category
in DDL1, violating our requirement (2).  However, as discussed above,
no other currently defined space-group-dependent datanames may appear
in such multi-spacegroup files and so the potential for
misinterpretation is restricted to applications that expect a single
space group and only deal with symmetry operators, which would appear
to be an unusual use-case.

To mitigate any ongoing problems from this legacy issue, we propose
prominently suggesting that software authors explicitly check for
multiple values when reading any items from the space group category.
Ideally, the space_group_symop category would also be renamed in the
symmetry dictionary, but such renaming causes problems for existing
software authors and would need to be conducted only after
consultation with the relevant community (e.g. the Bilbao
crystallographic server).

Note that choosing different datanames for the datanames contained
in the DDLm core_CIF dictionary is not a desirable option, as both
the magCIF and msCIF dictionaries base their naming schemes off
the symmetry CIF dictionary.

Future development
==================

The proposed change slightly reduces definition proliferation by
allowing both single-valued and multiple-valued versions of a dataname
to share the same definition.  However, all categories and datanames
that depend on the single-valued dataname must still have alternative
names defined when each single-valued dependency becomes
multiple-valued, leading to proliferation of definition blocks that
add very little information. Future work would create (e.g.)  DDLm
category attributes that auto-defined category datanames based on the
contents of other categories.  Using these attributes, add-on
dictionaries could be created economically and semi-automatically.

The restriction (2) in the new definition is excessive, in that some
categories may never have depended on overall space group (e.g. audit
information) but will nevertheless be excluded from datablocks. Future
work would develop a way to list explicitly the (known) dependencies
on overall values within each category - for calculated values, this
information is already automatically extractable from dREL methods -
to allow partial relaxation of (2). In practice, we expect that
categories that 'obviously' do not depend on the overall values will
still be included in datablocks, but it would be good to capture this
information in an attribute.

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.