boltz_data.ccd#

Chemical Component Dictionary (CCD) functionality.

Functions

chemical_component_dictionary_from_path(path, /)

Read a chemical component dictionary from a file.

chemical_component_from_path(path, /)

Read a single chemical component from a CIF file.

chemical_component_from_rcsb(*, comp_id)

Fetch a chemical component from RCSB PDB.

compress_chemical_component(...)

Compress a chemical component to a compact binary representation.

decompress_chemical_component(compressed, /)

Decompress a chemical component from its compressed binary representation.

get_builtin_chemical_component_dictionary()

Get the builtin chemical component dictionary.

get_remote_chemical_component_database()

Get a chemical component database that talks directly to the RCSB API.

read_chemical_component_dictionary_from_mmcif(...)

Read chemical component dictionary from mmCIF block.

Classes

ChemicalComponent(comp_id, type, name, ...)

Chemical component definition from CCD.

ChemicalComponentAtom(atom_id, element, charge)

Atom in a chemical component.

ChemicalComponentBond(atom_id_1, atom_id_2, ...)

Bond between atoms in a chemical component.

CompressedChemicalComponentDictionary(ccd, /)

A dictionary of chemical components stored in compressed binary format.

class boltz_data.ccd.ChemicalComponent(comp_id, type, name, atoms, bonds)[source]#

Chemical component definition from CCD.

Parameters:
__init__(comp_id, type, name, atoms, bonds)#
Parameters:
Return type:

None

comp_id: str#
type: str#
name: str#
atoms: dict[str, ChemicalComponentAtom]#
bonds: dict[tuple[str, str], ChemicalComponentBond]#
class boltz_data.ccd.ChemicalComponentAtom(atom_id, element, charge)[source]#

Atom in a chemical component.

Parameters:
__init__(atom_id, element, charge)#
Parameters:
Return type:

None

atom_id: str#
element: str#
charge: int#
class boltz_data.ccd.ChemicalComponentBond(atom_id_1, atom_id_2, order)[source]#

Bond between atoms in a chemical component.

Parameters:
  • atom_id_1 (str)

  • atom_id_2 (str)

  • order (int)

__init__(atom_id_1, atom_id_2, order)#
Parameters:
  • atom_id_1 (str)

  • atom_id_2 (str)

  • order (int)

Return type:

None

atom_id_1: str#
atom_id_2: str#
order: int#
class boltz_data.ccd.CompressedChemicalComponentDictionary(ccd, /)[source]#

A dictionary of chemical components stored in compressed binary format.

This class provides a mapping interface to access chemical components while keeping them compressed in memory. Components are decompressed on-demand and cached using an LRU cache to balance memory usage and performance.

Parameters:

ccd (dict[str, bytes])

__init__(ccd, /)[source]#

Initialize the compressed dictionary.

Parameters:

ccd (dict[str, bytes]) – Dictionary mapping component IDs to compressed bytes.

Return type:

None

classmethod from_ccd(ccd, /)[source]#

Create a compressed dictionary from a dictionary of chemical components.

Parameters:

ccd (dict[str, ChemicalComponent]) – Dictionary mapping component IDs to ChemicalComponent objects.

Return type:

CompressedChemicalComponentDictionary

Returns:

A CompressedChemicalComponentDictionary instance.

classmethod from_file(path, /)[source]#

Load a compressed chemical component dictionary from a file.

Supports both plain pickle (.pkl) and gzipped pickle (.pkl.gz) formats.

Parameters:

path (str | Path) – Path to the file containing the compressed dictionary.

Return type:

CompressedChemicalComponentDictionary

Returns:

A CompressedChemicalComponentDictionary instance.

to_file(path, /)[source]#

Save the compressed chemical component dictionary to a file.

If the file extension is .gz, the output will be gzip compressed.

Parameters:

path (str | Path) – Path to save the dictionary to.

Return type:

None

ccd: dict[str, bytes]#
boltz_data.ccd.chemical_component_dictionary_from_path(path, /)[source]#

Read a chemical component dictionary from a file.

Supports both compressed pickle formats (.pkl.gz, .pkl) and CIF formats (.cif.gz, .cif).

Parameters:

path (str | Path) – Path to the file containing the chemical component dictionary.

Return type:

Mapping[str, ChemicalComponent]

Returns:

A mapping from component IDs to ChemicalComponent objects.

Raises:

ValueError – If the file format is not supported.

boltz_data.ccd.chemical_component_from_path(path, /)[source]#

Read a single chemical component from a CIF file.

Parameters:

path (str | Path) – Path to the CIF file containing exactly one chemical component.

Return type:

ChemicalComponent

Returns:

The chemical component.

Raises:

ValueError – If the file does not contain exactly one chemical component.

boltz_data.ccd.chemical_component_from_rcsb(*, comp_id)[source]#

Fetch a chemical component from RCSB PDB.

Parameters:

comp_id (str) – The component ID to fetch.

Return type:

ChemicalComponent

Returns:

The chemical component.

Raises:

ValueError – If the component ID is not found in RCSB PDB.

boltz_data.ccd.compress_chemical_component(chemical_component, /)[source]#

Compress a chemical component to a compact binary representation.

The compression format minimizes size by: - Using null-terminated ASCII strings for text fields - Storing element symbols as single-byte periodic table indices - Using 1-byte or 2-byte integers for indices based on molecule size - Encoding bonds by atom indices rather than atom IDs to save space - Packing leaving atom flags as single bits within atom records

Parameters:

chemical_component (ChemicalComponent) – The chemical component to compress.

Return type:

bytes

Returns:

The compressed representation as bytes.

boltz_data.ccd.decompress_chemical_component(compressed, /)[source]#

Decompress a chemical component from its compressed binary representation.

The compression format is a custom binary encoding that minimizes size by: - Using null-terminated ASCII strings for text fields - Storing element symbols as single-byte periodic table indices - Using 1-byte or 2-byte integers for indices based on molecule size - Encoding bonds by atom indices rather than atom IDs to save space

Binary format structure:

  1. Component metadata:

    • Null-terminated ASCII string: comp_id

    • Null-terminated ASCII string: type

    • Null-terminated ASCII string: name

  2. Size indicator:

    • 1 byte: large_molecule flag (1 if num_atoms > 255 or num_bonds > 255)

    • 1 or 2 bytes: number of atoms (based on large_molecule flag)

  3. Atom records (repeated for each atom):

    • Null-terminated ASCII string: atom_id

    • 1 byte: element as periodic table index (1-120)

  4. Bond records:

    • 1 or 2 bytes: number of bonds (based on large_molecule flag)

    • For each bond:

      • 1 or 2 bytes: index of first atom in atom list

      • 1 or 2 bytes: index of second atom in atom list

      • 1 byte: bond order (1=single, 2=double, 3=triple)

Parameters:

compressed (bytes) – The compressed chemical component as bytes.

Return type:

ChemicalComponent

Returns:

The decompressed ChemicalComponent object.

boltz_data.ccd.get_builtin_chemical_component_dictionary()[source]#

Get the builtin chemical component dictionary.

Return type:

Mapping[str, ChemicalComponent]

boltz_data.ccd.get_remote_chemical_component_database()[source]#

Get a chemical component database that talks directly to the RCSB API.

Return type:

RemoteChemicalComponentDatabase

boltz_data.ccd.read_chemical_component_dictionary_from_mmcif(mmcif, /)[source]#

Read chemical component dictionary from mmCIF block.

Parameters:

mmcif (Document | Block) – The mmCIF block containing chemical component definitions.

Return type:

dict[str, ChemicalComponent]

Returns:

Dictionary mapping component IDs to ChemicalComponent objects.