Why it matters

Proteins are the machinery of biology. Almost every disease, drug, and bodily function comes down to proteins — what they do, where they're expressed, and what happens when they malfunction. But protein data lives locked in specialist databases like UniProt and PDB that are inaccessible to non-experts. Bringing the most important proteins into Geo as structured, linked entities connects molecular biology to diseases, drugs, and researchers in a way that's navigable by anyone.

What to publish

  • Create Protein entities for the 200 most significant human proteins

  • For each protein, publish:

    • Name and common abbreviations (e.g. "Tumor protein p53" / "TP53" / "p53")

    • Description — what it does in plain language

    • Gene that encodes it

    • UniProt ID

    • PDB structure ID(s) if solved

    • Protein family or class (enzyme, receptor, transcription factor, structural, antibody, hormone, etc.)

    • Biological function

    • Tissue or organ where it's most expressed

    • Pathway(s) it participates in (e.g. apoptosis, insulin signaling, immune response)

    • Associated diseases when mutated or dysregulated — link to Disease entities

    • Drugs that target it — create or link to entities

  • Create relations to:

    • Diseases linked to the protein — link to Disease entities

    • Biological pathways — create Topic entities (e.g. apoptosis, cell cycle, inflammation)

    • Related proteins in the same pathway or family

    • Key researchers who discovered or study the protein — link to Person entities

  • Create Topic entities for protein categories and pathways if they don't exist

Scope

200 proteins. Prioritize by clinical and research significance:

  • Cancer-related (p53, BRCA1, BRCA2, HER2, EGFR, RAS, MYC, BCL-2)

  • Immune system (TNF-alpha, interleukins, PD-1, PD-L1, CD4, CD8, immunoglobulins)

  • Metabolic (insulin, leptin, AMPK, mTOR, GLUT4, HbA1c)

  • Neurological (amyloid beta, tau, alpha-synuclein, BDNF, serotonin receptors, dopamine receptors)

  • Cardiovascular (troponin, ACE, ACE2, ApoB, Lp(a), fibrinogen)

  • Structural (collagen, keratin, actin, myosin, elastin)

  • Signaling (EGFR, VEGF, Wnt, Notch, JAK, STAT)

  • Longevity-associated (sirtuins, telomerase, klotho, FOXO)

  • Drug targets (COX-1/2, HMG-CoA reductase, HIV protease, ACE)

Potential sources

UniProt, NCBI Gene, PDB, KEGG pathways, Reactome, Human Protein Atlas, PubMed reviews, DrugBank (for drug-target relationships), OMIM (for disease associations).