Some Question for Design Process

Edit
Updated 2026-02-20 22:29
**1. High-level Questions (this can be further broken down)**

Short version

1. **External data ingestion & persistence**

    - What external sources are we pulling data from, and **where will the raw/landing data be stored** (DB, object storage, file system, etc.)?

    - What is the **system of record** for this data (which database/technology), and why?

2. **Agentic AI data access path**

    - For each agentic AI component, **what data does it need**, and **where will it read that data from** (raw zone, curated zone, SKG, etc.)?

    - What access patterns are expected (batch, query, streaming)?

3. **Approved/vetted data write-back**

    - When the research team generates and approves ("vets") data, **where is that approved data written to**?

    - How do we manage **versioning, lineage, and auditability** for approved outputs?

4. **Incremental / near real-time publishing from approvals**

    - Can we support **incremental updates** (near real-time push) as data is approved?

    - What would the **proposed architecture** look like (events/streaming, CDC, micro-batches), and what latency SLA are we targeting?

5. **Publishing options for downstream consumers**

    - What are the supported ways to **publish approved data** for downstream systems/users (APIs, streaming topics, database views, file exports, etc.)?

    - Which consumers need which mechanism, and what SLAs apply?

6. **API read model and source of truth**

    - For APIs that expose approved data, **what is the backing store they read from** (central SKG, our own serving store, cache)?

    - Do we need a separate **serving layer** to meet performance/availability?

7. **Central SKG vs. owned serving store (tradeoffs)**

    - What are the **pros/cons** of Multum APIs reading directly from the **central SKG** vs. reading from a **domain-owned store** (with central SKG as the FHRC-compliance publish target)

8.  **Ownership & operating model (to close the loop)**

	- boundary conditions for each layer (ingestion, storage, approval workflow, publishing, APIs), and what are the **run/operational responsibilities** (monitoring, access control, incident response, data quality)?