Model Storing and Serving
We will show how you can use DLF to store/serve trained models using S3 Buckets.
Requirements
You have permissions in a namespace where you can use for kubeflow (to create TFJobs, deployments etc)
Lets assume the namespace you can use is {my-namespace}
. Feel free to change accordingly.
Make sure you first follow the guide for Installation
We will loosely follow the example posted in mnist_vanilla_k8s.ipynb
NOTE: All example yaml files mentioned in the wiki are also available in examples/kubeflow
Build model container
There is a delta between existing distributed mnist examples and what's needed to run well as a TFJob. We will skip the kaniko part and just build and use the Dockerfile and model.py in examples/kubeflow
cd examples/kubeflow
docker build -t {MY-REGISTRY}/mnist-model -f Dockerfile.model .
docker push {MY-REGISTRY}/mnist-model
In case you use an authenticated registry, follow the instructions in configure-docker-credentials
Create an S3 Bucket and its Dataset
If you have an existing s3 bucket you can use, please proceed with this one. Otherwise follow the instructions in Configure IBM COS Storage
Now we need to create a dataset to point to the newly created bucket. Create a file that looks like this:
apiVersion: datashim.io/v1alpha1
kind: Dataset
metadata:
name: your-dataset
spec:
local:
type: "COS"
accessKeyID: "access_key_id"
secretAccessKey: "secret_access_key"
endpoint: "https://YOUR_ENDPOINT"
bucket: "YOUR_BUCKET"
region: "" #it can be empty
kubectl create -f my-dataset.yaml -n {my-namespace}
Launch a TFJob
Now we are ready to launch a tfjob in a much less verbose way since DLF takes care of mounting the dataset and providing access to the tensorflow pod:
apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
name: my-train
spec:
tfReplicaSpecs:
Ps:
replicas: 1
template:
metadata:
labels:
dataset.0.id: "your-dataset"
dataset.0.useas: "mount"
annotations:
sidecar.istio.io/inject: "false"
spec:
serviceAccount: default-editor
containers:
- name: tensorflow
command:
- python
- /opt/model.py
- --tf-model-dir=/mnt/datasets/your-dataset/mnist
- --tf-export-dir=/mnt/datasets/your-dataset/mnist/export
- --tf-train-steps=200
- --tf-batch-size=100
- --tf-learning-rate=0.1
image: yiannisgkoufas/mnist
workingDir: /opt
resources:
limits:
ephemeral-storage: "10Gi"
requests:
ephemeral-storage: "10Gi"
restartPolicy: OnFailure
Chief:
replicas: 1
template:
metadata:
labels:
dataset.0.id: "your-dataset"
dataset.0.useas: "mount"
annotations:
sidecar.istio.io/inject: "false"
spec:
serviceAccount: default-editor
containers:
- name: tensorflow
resources:
limits:
ephemeral-storage: "10Gi"
requests:
ephemeral-storage: "10Gi"
command:
- python
- /opt/model.py
- --tf-model-dir=/mnt/datasets/your-dataset/mnist
- --tf-export-dir=/mnt/datasets/your-dataset/mnist/export
- --tf-train-steps=200
- --tf-batch-size=100
- --tf-learning-rate=0.1
image: yiannisgkoufas/mnist
restartPolicy: OnFailure
Worker:
replicas: 1
template:
metadata:
labels:
dataset.0.id: "your-dataset"
dataset.0.useas: "mount"
annotations:
sidecar.istio.io/inject: "false"
spec:
serviceAccount: default-editor
containers:
- name: tensorflow
command:
- python
- /opt/model.py
- --tf-model-dir=/mnt/datasets/your-dataset/mnist
- --tf-export-dir=/mnt/datasets/your-dataset/mnist/export
- --tf-train-steps=200
- --tf-batch-size=100
- --tf-learning-rate=0.1
image: yiannisgkoufas/mnist
workingDir: /opt
restartPolicy: OnFailure
your-dataset
with the name of your dataset. Create the TFJob like that:
kubectl create -f tfjob.yaml -n {my-namespace}
View the Model in Tensorboard
You can inspect the model you created and stored in the remote S3 bucket by creating the following yaml file which again leverages the Dataset created.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: mnist-tensorboard
name: mnist-tensorboard
spec:
selector:
matchLabels:
app: mnist-tensorboard
template:
metadata:
labels:
app: mnist-tensorboard
version: v1
dataset.0.id: "your-dataset"
dataset.0.useas: "mount"
annotations:
sidecar.istio.io/inject: "false"
spec:
serviceAccount: default-editor
containers:
- command:
- /usr/local/bin/tensorboard
- --logdir=/mnt/datasets/your-dataset/mnist
- --port=80
image: tensorflow/tensorflow:1.15.2-py3
name: tensorboard
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
labels:
app: mnist-tensorboard
name: mnist-tensorboard
spec:
ports:
- name: http-tb
port: 80
targetPort: 80
selector:
app: mnist-tensorboard
type: ClusterIP
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: mnist-tensorboard
spec:
gateways:
- kubeflow/kubeflow-gateway
hosts:
- '*'
http:
- match:
- uri:
prefix: /mnist/default/tensorboard/
rewrite:
uri: /
route:
- destination:
host: mnist-tensorboard.default.svc.cluster.local
port:
number: 80
timeout: 300s
kubectl create -f tensorboard.yaml -n {my-namespace}
Model Serving Using KFServing
You can leverage DLF to run the inference service on the model you trained using KFServing as follows:
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "mnist-sample"
spec:
default:
predictor:
tensorflow:
storageUri: "pvc://your-dataset/mnist/export"
kubectl create -f kfserving-inference.yaml -n {my-namespace}
Model Serving Using Tensorflow Serving
Again you can leverage DLF to serve the model you trained.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: mnist
name: tensorflow-serving
spec:
selector:
matchLabels:
app: mnist-model
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
app: mnist-model
version: v1
dataset.0.id: "your-dataset"
dataset.0.useas: "mount"
spec:
serviceAccount: default-editor
containers:
- args:
- --port=9000
- --rest_api_port=8500
- --model_name=mnist
- --model_base_path=/mnt/datasets/your-dataset/mnist/export
command:
- /usr/bin/tensorflow_model_server
env:
- name: modelBasePath
value: /mnt/datasets/your-dataset/mnist/export
image: tensorflow/serving:1.15.0
imagePullPolicy: IfNotPresent
livenessProbe:
initialDelaySeconds: 30
periodSeconds: 30
tcpSocket:
port: 9000
name: mnist
ports:
- containerPort: 9000
- containerPort: 8500
resources:
limits:
cpu: "4"
memory: 4Gi
requests:
cpu: "1"
memory: 1Gi
volumeMounts:
- mountPath: /var/config/
name: model-config
volumes:
- configMap:
name: tensorflow-serving
name: model-config
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/path: /monitoring/prometheus/metrics
prometheus.io/port: "8500"
prometheus.io/scrape: "true"
labels:
app: mnist-model
name: tensorflow-serving
spec:
ports:
- name: grpc-tf-serving
port: 9000
targetPort: 9000
- name: http-tf-serving
port: 8500
targetPort: 8500
selector:
app: mnist-model
type: ClusterIP
---
kind: ConfigMap
apiVersion: v1
metadata:
name: tensorflow-serving
data:
monitoring_config.txt: |-
prometheus_config: {{
enable: true,
path: "/monitoring/prometheus/metrics"
}}
kubectl create -f tensorflow-serving -n {my-namespace}
If you want to deploy the demo with the MNIST UI follow the instructions in MNIST UI