
Published in Nature Scientific Data, this paper presents a comprehensive dataset of predicted protein structures for 42,042 distinct human proteins, generated using NVIDIA’s BioNeMo platform combined with Innophore’s CavitomiX technology. The dataset integrates predictions from AlphaFold 2, OpenFold, and ESMFold into a single, quality-controlled resource — representing the most complete structural coverage of the human proteome available for machine learning purposes. It is offered in both unedited and refined formats to support diverse applications including structure-based drug design and protein function prediction.