Why it matters
Proteins are the machinery of biology. Almost every disease, drug, and bodily function comes down to proteins — what they do, where they're expressed, and what happens when they malfunction. But protein data lives locked in specialist databases like UniProt and PDB that are inaccessible to non-experts. Bringing the most important proteins into Geo as structured, linked entities connects molecular biology to diseases, drugs, and researchers in a way that's navigable by anyone.
What to publish
Create Protein entities for the 200 most significant human proteins
For each protein, publish:
Name and common abbreviations (e.g. "Tumor protein p53" / "TP53" / "p53")
Description — what it does in plain language
Gene that encodes it
UniProt ID
PDB structure ID(s) if solved
Protein family or class (enzyme, receptor, transcription factor, structural, antibody, hormone, etc.)
Biological function
Tissue or organ where it's most expressed
Pathway(s) it participates in (e.g. apoptosis, insulin signaling, immune response)
Associated diseases when mutated or dysregulated — link to Disease entities
Drugs that target it — create or link to entities
Create relations to:
Diseases linked to the protein — link to Disease entities
Biological pathways — create Topic entities (e.g. apoptosis, cell cycle, inflammation)
Related proteins in the same pathway or family
Key researchers who discovered or study the protein — link to Person entities
Create Topic entities for protein categories and pathways if they don't exist
Scope
200 proteins. Prioritize by clinical and research significance:
Cancer-related (p53, BRCA1, BRCA2, HER2, EGFR, RAS, MYC, BCL-2)
Immune system (TNF-alpha, interleukins, PD-1, PD-L1, CD4, CD8, immunoglobulins)
Metabolic (insulin, leptin, AMPK, mTOR, GLUT4, HbA1c)
Neurological (amyloid beta, tau, alpha-synuclein, BDNF, serotonin receptors, dopamine receptors)
Cardiovascular (troponin, ACE, ACE2, ApoB, Lp(a), fibrinogen)
Structural (collagen, keratin, actin, myosin, elastin)
Signaling (EGFR, VEGF, Wnt, Notch, JAK, STAT)
Longevity-associated (sirtuins, telomerase, klotho, FOXO)
Drug targets (COX-1/2, HMG-CoA reductase, HIV protease, ACE)
Potential sources
UniProt, NCBI Gene, PDB, KEGG pathways, Reactome, Human Protein Atlas, PubMed reviews, DrugBank (for drug-target relationships), OMIM (for disease associations).