Unit IDs

We have developed a naming scheme to uniquely identify all units (amino acids, nucleotides, ligands, atoms, etc.) in any 3D structure from PDB. This allows for unambiguous identification and naming of not only individual components, but also collections of them such as loops and helices. With a clear naming scheme it becomes simpler for researchers to share data and annotations and provide powerful web services. These IDs are fundamental to our structural annotations.

The IDs are a string of ordered fields separated by vertical bars  (‘|’). Below we describe how to create each field and its meaning for the two types of unit IDs.

Unit ID Format

These IDs are based on the data in mmCIF files and may contain symmetry operators. An introduction to symmetry operators and biological assemblies is here. This format will uniquely identify all units and atoms in a structure.

Several fields in the format are considered optional and when not present have default values. Fields which are optional are marked as ‘(Optional)’. If an optional field is included then all fields must be included, with the exception of symmetry operators.

For the sake of consistency all case insensitive fields should be in upper case.

Unit Identifier Specification

We describe the type and case sensitivity of each field in the list below. In addition, we list which item in the mmCIF the data for each field comes from. We also show several examples of the IDs and their interpretation and use at the end.

Unit IDs can also be used to identify atoms. When identifying entire residues, the atom field is left blank.

  1. PDB ID Code
    • From PDBx/mmCIF item: _entry.id
    • 4 characters, case-insensitive
  2. Model Number
    • From PDBx/mmCIF item: _atom_site.pdbx_PDB_model_num
    • integer, range 1-99
  3. Chain ID
    • From PDBx/mmCIF item: _atom_site.auth_asym_id
    • string, case-sensitive
  4. Residue/Nucleotide/Component Identifier
    • From PDBx/mmCIF item: _atom_site.label_comp_id
    • 1-3 characters, case-insensitive
  5. Residue/Nucleotide/Component Number
    • From PDBx/mmCIF item: _atom_site.auth_seq_id
    • integer, range: -999..9999 (there are negative residue numbers)
  6. Atom Name (Optional, default: blank)
    • From PDBx/mmCIF item: _atom_site.label_atom_id
    • 0-4 characters, case-insensitive
    • blank means all atoms
  7. Alternate ID (Optional, default: blank)
    • From PDBx/mmCIF item: _atom_site.label_alt_id
    • Default value: blank
    • One of ['A', 'B', '0'], case-insensitive
  8. Insertion Code (Optional, default: blank)
    • From PDBx/mmCIF item: _atom_site.pdbx_PDB_ins_code
    • 1 character, case-insensitive
  9. Symmetry Operation (Optional, default: 1_555)
    • As defined in PDBx/mmCIF item: _pdbx_struct_oper_list.name
    • 5-6 characters, case-insensitive
    • For viral icosahedral structures, use “P_” + model number instead of symmetry operators. For example, 1A34|1|A|VAL|88|||P_1

Examples

  • Chain A in model 1 of 1ABC = “1ABC|1|A”
  • Nucleotide U(10) chain B of 1ABC = “1ABC|1|B|U|10”
  • Nucleotide U(15A) chain B, default symmetry operator = “1ABC|1|B|U|15|||A”
  • Nucleotide C(25) chain D subject to symmetry operation 2_655 = “1ABC|1|D|C|25||||2_655”

Unit ids for entire residues can contain 4, 7, or 8 string separators (|).

Atom Identifier Format

To be added later.

Tools for Unit IDs

We have developed some tools to generate and work with these formats.

  • UnitParser is a python module to parse new style unit ids. It is the reference parser for the Unit IDs.
  • UnitIdTranslation is a python tool which will generate all new style ids for the unit in a PDB file given an mmCIF file.
  • Translator is a web service to translate between the two ID formats.

Web services that use Unit IDs

http://rna.bgsu.edu/rna3dhub/display3D/unitid/1S72|1|0|A|965,1S72|1|0|U|1003,1S72|1|H|HIS|92

Note that individual units are listed, separated by commas.  This example shows a UA cWW basepair interacting with an amino acid.  One can provide such links to easily direct readers of a website or article to a view of the coordinates.

  • Map unit IDs from 3D structures to the corresponding columns of a multiple sequence alignment from the Comparative RNA Web (CRW) using the web service R3D-2-MSA.  See the R3D-2-MSA help page for examples.