boltz_data.mol#
Boltz molecule utilities.
Functions
|
Convert a BZMol to a CIF block representation. |
|
|
|
|
|
Create a BZMol from an entity definition. |
|
Create a BZMol from an mmCIF file with coordinates. |
|
|
|
Convert an RDKit molecule to a BZMol. |
|
Create a BZMol from a SMILES string using RDKit. |
|
Create a BZMol from a structure definition. |
|
Generate an SVG visualization of a BZMol structure. |
|
Concatenate multiple BZBioMol objects into a single BZBioMol. |
|
Generate a 3D conformer for a BZMol using RDKit. |
|
Generate 2D coordinates for a BZMol using RDKit. |
|
Find interfaces between chains using atoms within threshold distance. |
Calculate bounding spheres for each residue centered at the residue centroid. |
|
|
Generate biological assemblies from an mmCIF file and asymmetric unit BZBioMol. |
|
Convert a BZMol back to an mmCIF block. |
|
Convert a BZMol or BZBioMol to an RDKit molecule. |
|
Save a BZMol visualization as an SVG file. |
|
Convert a BZMol or BZBioMol to a SMILES string. |
|
Convert a BZBioMol to a structure definition. |
|
Extract a subset of a BZBioMol by selecting specific chains. |
|
Create a new BZMol with modified fields by replacing specified attributes. |
|
Validate a BZBioMol instance for consistency. |
Classes
|
Biomolecular structure with residues and chains. |
|
Base representation of a molecular structure with atoms and bonds. |
- class boltz_data.mol.BZBioMol(**data)[source]#
Biomolecular structure with residues and chains.
Extends BZMol to include residue and chain information for proteins, DNA, RNA, and other biomolecules.
- Parameters:
atom_element (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_charge (Annotated[ndarray[tuple[int, ...], dtype[int8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_atoms (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_order (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_resolved (Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_coordinates (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_b_factor (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)
atom_residue (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
residue_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
residue_number (Annotated[ndarray[tuple[int, ...], dtype[int32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)
residue_chain (Annotated[ndarray[tuple[int, ...], dtype[uint16]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
chain_id (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
chain_description (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)] | None)
- property chain_residues: list[slice | list[int]]#
Get residue indices for each chain as slices or lists.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'validate_default': True}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
- property residue_any_resolved: ndarray[tuple[int, ...], dtype[bool]]#
Get a boolean array of shape (n_residues,) indicating which residues have any resolved atoms.
- property residue_atoms: list[slice | list[int]]#
Get atom indices for each residue as slices or lists.
- property residue_ordinal: ndarray[tuple[int, ...], dtype[uint16]]#
0-indexed ordinal number of each residue within its chain.
-
atom_b_factor:
Optional[Annotated[ndarray[tuple[int,...],dtype[float32]]]]# Optional array of shape (n_atoms,) with B-factors for each atom.
-
residue_name:
Annotated[ndarray[tuple[int,...],dtype[str_]]]# Residue names (e.g., ‘ALA’, ‘GLY’ for proteins, ‘A’, ‘G’ for nucleic acids).
- class boltz_data.mol.BZMol(**data)[source]#
Base representation of a molecular structure with atoms and bonds.
This object can represent both single molecules and multiple disconnected molecules, in a similar manner to an RDKit
Molobject.Unlike a
Molobject, information is stored by property, rather than by atom. For example, theatom_nameproperty is an array of shape(n_atoms,)containing the names of all atoms.- Parameters:
atom_element (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_name (Annotated[ndarray[tuple[int, ...], dtype[str_]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_charge (Annotated[ndarray[tuple[int, ...], dtype[int8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_atoms (Annotated[ndarray[tuple[int, ...], dtype[uint32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
bond_order (Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_resolved (Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
atom_coordinates (Annotated[ndarray[tuple[int, ...], dtype[float32]], GetPydanticSchema(get_pydantic_core_schema=~boltz_data.pydantic._get_schema, get_pydantic_json_schema=None), AfterValidator(func=~boltz_data.pydantic.Shape.<locals>.validate)])
- __init__(**data)[source]#
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- Parameters:
data (Any)
- Return type:
None
- property angle_atoms: Annotated[ndarray[tuple[int, ...], dtype[uint16]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
Get array of shape (n_angles, 3) with atom indices for each angle.
- property angle_resolved: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
Get boolean array of shape (n_angles,) indicating which angles have valid coordinates.
- property atom_adjacency_matrix: ndarray[tuple[int, ...], dtype[bool]]#
Get the atom-atom adjacency matrix based on the bonds.
- property atom_is_aromatic: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
- property atom_num_pi_electrons: Annotated[ndarray[tuple[int, ...], dtype[uint8]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
- property bond_is_aromatic: Annotated[ndarray[tuple[int, ...], dtype[bool]], GetPydanticSchema(get_pydantic_core_schema=_get_schema, get_pydantic_json_schema=None)]#
- property bond_length: ndarray[tuple[int, ...], dtype[float32]]#
Calculate bond lengths if coordinates are available.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'frozen': True, 'validate_default': True}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict][pydantic.config.ConfigDict].
-
atom_element:
Annotated[ndarray[tuple[int,...],dtype[uint8]]]# Atomic numbers for each atom, stored as int array.
-
atom_name:
Annotated[ndarray[tuple[int,...],dtype[str_]]]# The atom names (e.g., ‘CA’, ‘N’, ‘C’, ‘O’ for proteins).
-
atom_charge:
Annotated[ndarray[tuple[int,...],dtype[int8]]]# Formal charges for each atom, stored as a mapping from atom index to charge.
-
bond_atoms:
Annotated[ndarray[tuple[int,...],dtype[uint32]]]# Array of shape (n_bonds, 2) with atom indices for each bond.
-
bond_order:
Annotated[ndarray[tuple[int,...],dtype[uint8]]]# Bond orders (1=single, 2=double, 3=triple), stored as int array.
- boltz_data.mol.bzcif_from_bzmol(bzmol, /)[source]#
Convert a BZMol to a CIF block representation.
- Return type:
Block- Parameters:
bzmol (BZMol)
- boltz_data.mol.bzmol_from_chemical_component(chemical_component)[source]#
- Return type:
- Parameters:
chemical_component (ChemicalComponent | str)
- boltz_data.mol.bzmol_from_chemical_components(*, chemical_components, chain_id, bonds=None, residue_numbers=None, description=None)[source]#
- boltz_data.mol.bzmol_from_definition(definition, /, *, chemical_component_dictionary=None, chain_id, residue_numbers=None)[source]#
Create a BZMol from an entity definition.
- boltz_data.mol.bzmol_from_mmcif(mmcif, *, chemical_component_dictionary=None)[source]#
Create a BZMol from an mmCIF file with coordinates.
This function: 1. Parses entity definitions from the mmCIF 2. Creates BZMols for each entity instance 3. Concatenates them into a single structure 4. Maps atom coordinates from the mmCIF to the BZMol
- Parameters:
mmcif (
Block) – The mmCIF block containing structure data.chemical_component_dictionary (
Mapping[str,ChemicalComponent] |None) – Dictionary mapping component IDs to ChemicalComponent objects.
- Return type:
- Returns:
A BZMol containing all atoms with their coordinates and a mask indicating which atoms have valid coordinates.
- boltz_data.mol.bzmol_from_rdmol(rdmol, /, *, conformer_id=-1)[source]#
Convert an RDKit molecule to a BZMol.
This function extracts the molecular structure from an RDKit molecule to create a BZMol object. The resulting BZMol contains only atoms and bonds, without residue or chain information.
- Parameters:
- Return type:
- Returns:
A BZMol object containing atoms and bonds without coordinates, residues, or chains.
Example
>>> from rdkit import Chem >>> rdmol = Chem.MolFromSmiles("CCO") >>> mol = bzmol_from_rdmol(rdmol) >>> mol.num_atoms 3 # Without hydrogens
- boltz_data.mol.bzmol_from_smiles(smiles, /)[source]#
Create a BZMol from a SMILES string using RDKit.
This function converts a SMILES string to an RDKit molecule, then converts it to a BZMol object. The resulting BZMol contains only atoms and bonds, without residue or chain information.
- Parameters:
smiles (
str) – A valid SMILES string representing the molecule.- Return type:
- Returns:
A BZMol object containing atoms and bonds without coordinates, residues, or chains.
- Raises:
ValueError – If the SMILES string is invalid or cannot be parsed.
Example
>>> mol = bzmol_from_smiles("CCO") # Ethanol >>> mol.num_atoms 3 # Without hydrogens >>> mol.num_residues 0 # No residues defined
- boltz_data.mol.bzmol_from_structure(structure, /, chemical_component_dictionary=None)[source]#
Create a BZMol from a structure definition.
- Return type:
- Parameters:
structure (StructureDefinition)
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None)
- boltz_data.mol.bzmol_to_svg(mol, /, *, box_width=60, box_height=30, padding=5)[source]#
Generate an SVG visualization of a BZMol structure.
Each chain is shown as a separate section with residues as rows and atoms as columns.
- Parameters:
- Return type:
- Returns:
SVG string representing the BZMol structure.
- boltz_data.mol.concat_bzmols(*bzmols)[source]#
Concatenate multiple BZBioMol objects into a single BZBioMol.
- boltz_data.mol.generate_conformer(bzmol, /, *, seed)[source]#
Generate a 3D conformer for a BZMol using RDKit.
- boltz_data.mol.generate_depiction(bzmol, /, *, match_3d=True)[source]#
Generate 2D coordinates for a BZMol using RDKit.
- boltz_data.mol.get_molecular_interfaces(bzmol, /, threshold=5.0)[source]#
Find interfaces between chains using atoms within threshold distance.
Uses a two-pass approach: 1. Find residue pairs with overlapping bounding boxes (fast sweep-and-prune) 2. Check atom distances only for those residue pairs (KDTree per pair)
This leverages residue grouping to avoid O(n²) atom comparisons.
- boltz_data.mol.get_residue_bounding_spheres_around_centroid(bzmol, /)[source]#
Calculate bounding spheres for each residue centered at the residue centroid.
- Return type:
Spheres- Parameters:
bzmol (BZBioMol)
- boltz_data.mol.iterate_assemblies(*, mmcif, bzmol, max_atoms=None)[source]#
Generate biological assemblies from an mmCIF file and asymmetric unit BZBioMol.
- boltz_data.mol.mmcif_from_bzmol(bzmol, /, chemical_component_dictionary=None, name='pred')[source]#
Convert a BZMol back to an mmCIF block.
- Parameters:
- Return type:
Block- Returns:
An mmCIF block representing the structure in the BZMol.
- boltz_data.mol.rdmol_from_bzmol(bzmol, /)[source]#
Convert a BZMol or BZBioMol to an RDKit molecule.
- boltz_data.mol.save_bzmol_svg(mol, filepath, /, *, box_width=60, box_height=30, padding=5)[source]#
Save a BZMol visualization as an SVG file.
- boltz_data.mol.structure_from_bzmol(bzmol, /, chemical_component_dictionary=None)[source]#
Convert a BZBioMol to a structure definition.
- Return type:
StructureDefinition- Parameters:
bzmol (BZBioMol)
chemical_component_dictionary (Mapping[str, ChemicalComponent] | None)
- boltz_data.mol.subset_bzmol(bzmol, *, chain_ids=None)[source]#
Extract a subset of a BZBioMol by selecting specific chains.