boltz_data.ccd#
Chemical Component Dictionary (CCD) functionality.
Functions
Read a chemical component dictionary from a file. |
|
|
Read a single chemical component from a CIF file. |
|
Fetch a chemical component from RCSB PDB. |
Compress a chemical component to a compact binary representation. |
|
|
Decompress a chemical component from its compressed binary representation. |
Get the builtin chemical component dictionary. |
|
Get a chemical component database that talks directly to the RCSB API. |
|
Read chemical component dictionary from mmCIF block. |
Classes
|
Chemical component definition from CCD. |
|
Atom in a chemical component. |
|
Bond between atoms in a chemical component. |
A dictionary of chemical components stored in compressed binary format. |
- class boltz_data.ccd.ChemicalComponent(comp_id, type, name, atoms, bonds)[source]#
Chemical component definition from CCD.
- Parameters:
- __init__(comp_id, type, name, atoms, bonds)#
-
atoms:
dict[str,ChemicalComponentAtom]#
- class boltz_data.ccd.ChemicalComponentAtom(atom_id, element, charge)[source]#
Atom in a chemical component.
- __init__(atom_id, element, charge)#
- class boltz_data.ccd.ChemicalComponentBond(atom_id_1, atom_id_2, order)[source]#
Bond between atoms in a chemical component.
- __init__(atom_id_1, atom_id_2, order)#
- class boltz_data.ccd.CompressedChemicalComponentDictionary(ccd, /)[source]#
A dictionary of chemical components stored in compressed binary format.
This class provides a mapping interface to access chemical components while keeping them compressed in memory. Components are decompressed on-demand and cached using an LRU cache to balance memory usage and performance.
- classmethod from_ccd(ccd, /)[source]#
Create a compressed dictionary from a dictionary of chemical components.
- Parameters:
ccd (
dict[str,ChemicalComponent]) – Dictionary mapping component IDs to ChemicalComponent objects.- Return type:
- Returns:
A CompressedChemicalComponentDictionary instance.
- classmethod from_file(path, /)[source]#
Load a compressed chemical component dictionary from a file.
Supports both plain pickle (.pkl) and gzipped pickle (.pkl.gz) formats.
- boltz_data.ccd.chemical_component_dictionary_from_path(path, /)[source]#
Read a chemical component dictionary from a file.
Supports both compressed pickle formats (.pkl.gz, .pkl) and CIF formats (.cif.gz, .cif).
- Parameters:
path (
str|Path) – Path to the file containing the chemical component dictionary.- Return type:
- Returns:
A mapping from component IDs to ChemicalComponent objects.
- Raises:
ValueError – If the file format is not supported.
- boltz_data.ccd.chemical_component_from_path(path, /)[source]#
Read a single chemical component from a CIF file.
- Parameters:
path (
str|Path) – Path to the CIF file containing exactly one chemical component.- Return type:
- Returns:
The chemical component.
- Raises:
ValueError – If the file does not contain exactly one chemical component.
- boltz_data.ccd.chemical_component_from_rcsb(*, comp_id)[source]#
Fetch a chemical component from RCSB PDB.
- Parameters:
comp_id (
str) – The component ID to fetch.- Return type:
- Returns:
The chemical component.
- Raises:
ValueError – If the component ID is not found in RCSB PDB.
- boltz_data.ccd.compress_chemical_component(chemical_component, /)[source]#
Compress a chemical component to a compact binary representation.
The compression format minimizes size by: - Using null-terminated ASCII strings for text fields - Storing element symbols as single-byte periodic table indices - Using 1-byte or 2-byte integers for indices based on molecule size - Encoding bonds by atom indices rather than atom IDs to save space - Packing leaving atom flags as single bits within atom records
- Parameters:
chemical_component (
ChemicalComponent) – The chemical component to compress.- Return type:
- Returns:
The compressed representation as bytes.
- boltz_data.ccd.decompress_chemical_component(compressed, /)[source]#
Decompress a chemical component from its compressed binary representation.
The compression format is a custom binary encoding that minimizes size by: - Using null-terminated ASCII strings for text fields - Storing element symbols as single-byte periodic table indices - Using 1-byte or 2-byte integers for indices based on molecule size - Encoding bonds by atom indices rather than atom IDs to save space
Binary format structure:
Component metadata:
Null-terminated ASCII string: comp_id
Null-terminated ASCII string: type
Null-terminated ASCII string: name
Size indicator:
1 byte: large_molecule flag (1 if num_atoms > 255 or num_bonds > 255)
1 or 2 bytes: number of atoms (based on large_molecule flag)
Atom records (repeated for each atom):
Null-terminated ASCII string: atom_id
1 byte: element as periodic table index (1-120)
Bond records:
1 or 2 bytes: number of bonds (based on large_molecule flag)
For each bond:
1 or 2 bytes: index of first atom in atom list
1 or 2 bytes: index of second atom in atom list
1 byte: bond order (1=single, 2=double, 3=triple)
- Parameters:
compressed (
bytes) – The compressed chemical component as bytes.- Return type:
- Returns:
The decompressed ChemicalComponent object.
- boltz_data.ccd.get_builtin_chemical_component_dictionary()[source]#
Get the builtin chemical component dictionary.
- Return type:
- boltz_data.ccd.get_remote_chemical_component_database()[source]#
Get a chemical component database that talks directly to the RCSB API.
- Return type:
RemoteChemicalComponentDatabase
- boltz_data.ccd.read_chemical_component_dictionary_from_mmcif(mmcif, /)[source]#
Read chemical component dictionary from mmCIF block.
- Parameters:
mmcif (
Document|Block) – The mmCIF block containing chemical component definitions.- Return type:
- Returns:
Dictionary mapping component IDs to ChemicalComponent objects.