Working with mmCIF Files#

The boltz_data.cif module provides tools for reading and writing mmCIF files, which is used as the standard format for protein structural data.

What are mmCIF Files?#

CIF (Crystallographic Information File) is a format for storing crystallographic data. These files are plaintext files containing one or more data blocks, each of which contain one or more tables of data.

An extension called mmCIF (macromolecular CIF) is the standard format for structural biology data. The information contained within an mmCIF file can include:

  • The 3D coordinates of atoms

  • Polymer sequences

  • Experimental metadata (resolution, method, etc.)

  • Biological assembly information

  • Chemical component information

The boltz_data.cif module builds upon the gemmi library to provide a consistent interface for reading and writing CIF files.

Reading CIF Files#

The reading and writing functions are built upon gemmi’s Document object (representing a full file) and Block object (representing a single data block).

There are two separate functions for reading CIF files, depending on whether you wish to read it as a document or you wish to read it as a single block.

from boltz_data.cif import read_cif_from_file, read_single_cif_from_file

# Reads in the CIF as a `gemmi.cif.Document`
cif = read_cif_from_file("structure.cif")

# Reads in the CIF as a `gemmi.cif.Block`
# Equivalent to `read_single_cif_from_file(path).sole_block()`
cif_block = read_single_cif_from_file("structure.cif")

These functions use smart_open under the hood, which means they can read from local files, URLs, and cloud storage paths. They also automatically support compressed files such as .cif.gz and .cif.bz2.

# Read a CIF directly from a URL
cif = read_single_cif_from_file("https://files.rcsb.org/download/1ABC.cif")

Writing CIF Files#

The writing function can write either a gemmi.cif.Block or gemmi.cif.Document to a file. Like reading, this automatically supports compression and nonlocal paths.

# Write a `gemmi.cif.Block` or `gemmi.cif.Document`
write_cif("output.cif", contents=cif)

Parsing mmCIF files#

The reading and writing functions above simply read in all the data. To use it, you can convert it into other formats.

Structure Definition#

The get_structure_from_mmcif() function can be used to parse an mmCIF file into a StructureDefinition. This captures the entities and chains of the structure, but without coordinate information.

from boltz_data.cif import read_single_cif_from_file, get_structure_from_mmcif

cif = read_single_cif_from_file("structure.cif")
structure = get_structure_from_mmcif(cif, chemical_component_dictionary=chemical_component_dictionary)

BZMol#

Generally, you will want to convert the mmCIF file into a BZBioMol object. This is the standard molecular representation of this package, and contains coordinate information.

from boltz_data.cif import read_single_cif_from_file
from boltz_data.mol import bzmol_from_mmcif

# Read and convert
cif = read_single_cif_from_file("structure.cif")
bzmol = bzmol_from_mmcif(cif, chemical_component_dictionary=ccd)

You can also convert a BZBioMol object back into an mmCIF file, using the mmcif_from_bzmol() function.

from boltz_data.mol import mmcif_from_bzmol
from boltz_data.cif import write_cif

# Convert BZMol to CIF block
cif_block = mmcif_from_bzmol(bzmol, chemical_component_dictionary=ccd, name="my_structure")

API Reference#

For detailed API documentation, see: