Vertex AI manages model versioning and lifecycle through a registry-centric workflow that tracks artifacts from training to serving. When you train or import a model, it lands in the Model Registry with a unique version and metadata (creator, framework, hyperparameters, metrics, lineage). You can promote versions through states such as “staging,” “candidate,” and “production,” attach evaluation reports, and tag them for automated CI/CD. Endpoints can host multiple versions at the same time, enabling traffic splitting for canary and A/B tests without redeploying infrastructure.
During deployment, you can pin endpoint traffic weights (e.g., 90% to v2, 10% to v3), collect latency and accuracy metrics, and roll forward/back instantly. Artifact lineage connects datasets, code, and pipelines, which is useful for audits and reproducibility. Scheduled evaluation jobs can run on holdout data or production feedback to detect quality regressions over time. If a new training run beats the incumbent on predefined metrics, a pipeline can automatically register the candidate, push a small traffic slice, and, if SLOs hold, gradually increase share to 100%.
Good hygiene is straightforward but critical: freeze training data snapshots, record feature schemas, and attach git SHAs and container digests to each model version. Keep a deprecation policy for old versions with a grace period, and export model cards or evaluation summaries as artifacts in the registry. If your system uses Milvus for retrieval, treat the embedding model as part of the lifecycle: version the embedding model, the Milvus index parameters, and the index snapshot together. For safe rollouts, run dual indexes (old/new embeddings) in parallel and split retrieval traffic the same way you split endpoint traffic for generation.
