Index meaning, not only metadata
Biological entries are more than names, IDs, and keywords. Useful search must capture structure, function, relationships, provenance, and context.
Straintest Bio Initiative
Building AI-native search infrastructure for biological datasets.
01
Biology has produced an enormous amount of high-signal data. Most of it was built for human lookup, not for AI systems that need to search, retrieve, compare, and reason across thousands of entries at once.
The core problem is fragmentation.
A single biological entry contains:
Existing interfaces surface only a fraction of this.
Protplex indexes all of it. Entries become discoverable by the signal they carry, not just the labels they have been assigned. Protplex makes existing databases easier to navigate, especially when a query does not map cleanly to a known entry or identifier.
02
Bringing biology into the agentic AI era is a data access problem.
AI coding tools became powerful because models could interact with code, documentation, repositories, and execution environments. Biology needs the same kind of accessible substrate: data that AI systems can search, retrieve, and reason over in loops.
Without it, models fall back on their training distribution. Retrieval skews toward well known proteins, pathways, and organisms. Relevant but less visible entries stay buried.
Protplex is built on the following premise: retrieval should surface candidates by relevance, not biasing against less visible entries.
03
We call textualizationthe process of converting what matters about a biological entry into rich text, built by combining data available from every modality.
For each entry, Protplex collects and indexes signals across:
The result is an entry that can be retrieved based on what it does and how it relates to other biology, not just what it is named.
04
Biological entries are more than names, IDs, and keywords. Useful search must capture structure, function, relationships, provenance, and context.
Future scientific tools will search, retrieve, reason, and refine. For that to work, datasets must be exposed in forms language models can navigate reliably.
Protplex is not a substitute for scientific interpretation. It is an interface for finding better candidates, surfacing context, and reducing manual search friction.
Protplex adds a search and reasoning layer on top of existing resources. It does not aim to replace curated biological databases or their attribution.
05
Protplex is a Straintest Bio Initiative project. Its development and operation are handled by Straintest LLC, registered at Hagenholzstrasse 62, 8050 Zurich, Switzerland.
Straintest builds technology for interpreting complex real world data: computer vision, spatial reasoning, large scale processing, and multimodal analysis. Protplex applies the same technical direction to biology.
Learn more at straintest.co.
Reach us for collaborations, integrations, early access, or feedback.