BERKELEY, Calif. — The OpenFold Consortium has announced a major update to its OpenFold3 system and released its training datasets and full-stack tooling to support reproducible biomolecular artificial intelligence research.
OpenFold3 is an open-source deep learning system designed to predict the three-dimensional structures of biomolecular complexes from sequence and molecular inputs. The platform can model interactions between proteins, small molecules and nucleic acids, supporting research in drug discovery, protein engineering and basic biology.
The latest release includes publicly available training datasets, model weights, training and inference code and evaluation scripts under permissive licenses, enabling researchers to independently reproduce results, benchmark performance and retrain models for new applications. The datasets are being distributed through Amazon Web Services’ Registry of Open Data.
“OpenFold was built on the principle that foundational AI for biology should be open, reproducible, and auditable,” said Woody Sherman, Executive Committee Chairperson of the OpenFold Consortium and Chief Innovation Officer at PsiThera. “By releasing OpenFold3 with open data, permissive licensing, and transparent workflows, we are enabling independent validation and rapid iteration so researchers can turn cofolding models into reliable scientific infrastructure that accelerates drug discovery and deepens our mechanistic understanding of biology.”
The consortium said the new release provides an end-to-end open cofolding stack designed to support evaluation workflows as well as downstream development. Updated benchmarks in the OpenFold3 white paper show competitive performance with AlphaFold3 across most evaluated modalities and strong results relative to earlier open cofolding models.
“Releasing the full training stack behind OpenFold3 is a major milestone for reproducible biomolecular AI. It allows the community not only to run the model, but to inspect, retrain, and push the technology forward,” said Nazim Bouatta, Ph.D, OpenFold Advisor.
The dataset release is intended to lower barriers for independent training and method development, allowing researchers to validate results, retrain models for new scientific questions and compare approaches using shared datasets.
To support broader adoption, the consortium has also launched the OpenFold3 portal, which provides onboarding documentation, installation guidance, reference inference pipelines, evaluation scripts and dataset access information. A public community channel has also been established for technical questions and issue tracking.
“The power of OpenFold3 is that it’s not just a model, it’s a fully open foundation the community can adapt and extend,” said Arman Zaribafiyan, Head of Strategic Alliances at SandboxAQ. “At SandboxAQ, we’ve already built on earlier OpenFold advances in our AQAffinity model for binding-affinity prediction, and we’re now adopting OpenFold3 to supercharge those capabilities even further. That kind of open, collaborative development is exactly what biomolecular AI needs to deliver meaningful results, accelerated discovery and eventually new medicines faster.”
The consortium said antibody–antigen complex prediction remains a challenging area for current computational approaches, including OpenFold3. Improving performance in this area will be a major focus for the project in 2026, with planned work involving expanded datasets, improved benchmarks and new modeling strategies targeting immune-related complexes.
The OpenFold Consortium said the release is part of its broader effort to ensure that foundational AI tools for biology remain open and accessible to the global scientific community. By making models, data and infrastructure available, the consortium aims to support reproducibility and accelerate innovation across academia, biotechnology companies, pharmaceutical developers and nonprofit research groups.


