Straintest Bio Initiative

Unlocking biological data for the AI era

Building AI-native search infrastructure for biological datasets.

01

What Protplex is

Biology has produced an enormous amount of high-signal data. Most of it was built for human lookup, not for AI systems that need to search, retrieve, compare, and reason across thousands of entries at once.

The core problem is fragmentation.

A single biological entry contains:

  • Molecular structure and 3D geometry
  • Sequence annotations and domain architecture
  • Experimental metadata and provenance
  • Literature references and external identifiers
  • Domain specific labels and ontology terms

Existing interfaces surface only a fraction of this.

Protplex indexes all of it. Entries become discoverable by the signal they carry, not just the labels they have been assigned. Protplex makes existing databases easier to navigate, especially when a query does not map cleanly to a known entry or identifier.

02

Why it exists

Bringing biology into the agentic AI era is a data access problem.

AI coding tools became powerful because models could interact with code, documentation, repositories, and execution environments. Biology needs the same kind of accessible substrate: data that AI systems can search, retrieve, and reason over in loops.

Without it, models fall back on their training distribution. Retrieval skews toward well known proteins, pathways, and organisms. Relevant but less visible entries stay buried.

Protplex is built on the following premise: retrieval should surface candidates by relevance, not biasing against less visible entries.

03

The textualization layer

We call textualizationthe process of converting what matters about a biological entry into rich text, built by combining data available from every modality.

For each entry, Protplex collects and indexes signals across:

  • Structure: 3D geometry, binding sites, and secondary structure elements
  • Sequence: residue composition, conservation, and domain architecture
  • Function: GO terms, pathway membership, and known interactions
  • Context: experimental method, resolution, organism, and provenance
  • Relationships: homologs, variants, and related entries across databases

The result is an entry that can be retrieved based on what it does and how it relates to other biology, not just what it is named.

04

Principles

Index meaning, not only metadata

Biological entries are more than names, IDs, and keywords. Useful search must capture structure, function, relationships, provenance, and context.

Make data usable by agents

Future scientific tools will search, retrieve, reason, and refine. For that to work, datasets must be exposed in forms language models can navigate reliably.

Preserve expert control

Protplex is not a substitute for scientific interpretation. It is an interface for finding better candidates, surfacing context, and reducing manual search friction.

Keep primary sources central

Protplex adds a search and reasoning layer on top of existing resources. It does not aim to replace curated biological databases or their attribution.

05

Who is behind this?

Protplex is a Straintest Bio Initiative project. Its development and operation are handled by Straintest LLC, registered at Hagenholzstrasse 62, 8050 Zurich, Switzerland.

Straintest builds technology for interpreting complex real world data: computer vision, spatial reasoning, large scale processing, and multimodal analysis. Protplex applies the same technical direction to biology.

Learn more at straintest.co.

Get in touch

Reach us for collaborations, integrations, early access, or feedback.