Some Question for Design Process
**1. High-level Questions (this can be further broken down)**
Short version
1. **External data ingestion & persistence**
- What external sources are we pulling data from, and **where will the raw/landing data be stored** (DB, object storage, file system, etc.)?
- What is the **system of record** for this data (which database/technology), and why?
2. **Agentic AI data access path**
- For each agentic AI component, **what data does it need**, and **where will it read that data from** (raw zone, curated zone, SKG, etc.)?
- What access patterns are expected (batch, query, streaming)?
3. **Approved/vetted data write-back**
- When the research team generates and approves ("vets") data, **where is that approved data written to**?
- How do we manage **versioning, lineage, and auditability** for approved outputs?
4. **Incremental / near real-time publishing from approvals**
- Can we support **incremental updates** (near real-time push) as data is approved?
- What would the **proposed architecture** look like (events/streaming, CDC, micro-batches), and what latency SLA are we targeting?
5. **Publishing options for downstream consumers**
- What are the supported ways to **publish approved data** for downstream systems/users (APIs, streaming topics, database views, file exports, etc.)?
- Which consumers need which mechanism, and what SLAs apply?
6. **API read model and source of truth**
- For APIs that expose approved data, **what is the backing store they read from** (central SKG, our own serving store, cache)?
- Do we need a separate **serving layer** to meet performance/availability?
7. **Central SKG vs. owned serving store (tradeoffs)**
- What are the **pros/cons** of Multum APIs reading directly from the **central SKG** vs. reading from a **domain-owned store** (with central SKG as the FHRC-compliance publish target)
8. **Ownership & operating model (to close the loop)**
- boundary conditions for each layer (ingestion, storage, approval workflow, publishing, APIs), and what are the **run/operational responsibilities** (monitoring, access control, incident response, data quality)?