DEV

Some Question for Design Process

Edit
Updated 2026-02-20 22:29
**1. High-level Questions (this can be further broken down)** Short version 1. **External data ingestion & persistence** - What external sources are we pulling data from, and **where will the raw/landing data be stored** (DB, object storage, file system, etc.)? - What is the **system of record** for this data (which database/technology), and why? 2. **Agentic AI data access path** - For each agentic AI component, **what data does it need**, and **where will it read that data from** (raw zone, curated zone, SKG, etc.)? - What access patterns are expected (batch, query, streaming)? 3. **Approved/vetted data write-back** - When the research team generates and approves ("vets") data, **where is that approved data written to**? - How do we manage **versioning, lineage, and auditability** for approved outputs? 4. **Incremental / near real-time publishing from approvals** - Can we support **incremental updates** (near real-time push) as data is approved? - What would the **proposed architecture** look like (events/streaming, CDC, micro-batches), and what latency SLA are we targeting? 5. **Publishing options for downstream consumers** - What are the supported ways to **publish approved data** for downstream systems/users (APIs, streaming topics, database views, file exports, etc.)? - Which consumers need which mechanism, and what SLAs apply? 6. **API read model and source of truth** - For APIs that expose approved data, **what is the backing store they read from** (central SKG, our own serving store, cache)? - Do we need a separate **serving layer** to meet performance/availability? 7. **Central SKG vs. owned serving store (tradeoffs)** - What are the **pros/cons** of Multum APIs reading directly from the **central SKG** vs. reading from a **domain-owned store** (with central SKG as the FHRC-compliance publish target) 8. **Ownership & operating model (to close the loop)** - boundary conditions for each layer (ingestion, storage, approval workflow, publishing, APIs), and what are the **run/operational responsibilities** (monitoring, access control, incident response, data quality)?