I've created with docker a MinioS3 artifact storage and a MySQL backend storage using the next Docker Compose:
version: '3.8'
- '3306'
- '(path)/server_backend:/var/lib/mysql '
image: 'mysql'
container_name: db
- '9000'
- '9000:9000'
- db
command: server /data
- '(path)/server_artifact:/data'
image: minio/minio:RELEASE.2021-02-14T04-01-33Z
container_name: MinIO
build: ./mlflow
- '5000'
- '5000:5000'
- storage
image: 'mlflow:Dockerfile'
container_name: server
The Mlflow server docker was created using the next Dockerfile:
FROM python:3.8-slim-buster
WORKDIR /usr/src/app
RUN pip install cryptography mlflow psycopg2-binary boto3 pymysql
ENV MLFLOW_S3_ENDPOINT_URL=http://storage:9000
CMD mlflow server \
--backend-store-uri mysql+pymysql://MLFLOW:temporal@db:3306/DBMLFLOW \
--default-artifact-root s3://artifacts \
The credantials are defined in a .env
The results of the docker-compose
up command:
[+] Running 21/22
- mlflow Error 5.6s
- storage Pulled 36.9s
- a6b97b4963f5 Pull complete 24.6s
- 13948a011eec Pull complete 24.7s
- 40cdef9976a6 Pull complete 24.7s
- f47162848743 Pull complete 24.8s
- 5f2758d8e94c Pull complete 24.9s
- c2950439edb8 Pull complete 25.0s
- 1b08f8a15998 Pull complete 30.7s
- db Pulled 45.8s
- 07aded7c29c6 Already exists 0.0s
- f68b8cbd22de Pull complete 0.7s
- 30c1754a28c4 Pull complete 2.1s
- 1b7cb4d6fe05 Pull complete 2.2s
- 79a41dc56b9a Pull complete 2.3s
- 00a75e3842fb Pull complete 6.7s
- b36a6919c217 Pull complete 6.8s
- 635b0b84d686 Pull complete 6.8s
- 6d24c7242d02 Pull complete 39.4s
- 5be6c5edf16f Pull complete 39.5s
- cb35eac1242c Pull complete 39.5s
- a573d4e1c407 Pull complete 39.6s
[+] Building 1.4s (7/7) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/python:3.8-slim-buster 1.3s
=> [1/3] FROM docker.io/library/python:3.8-slim-buster@sha256:13a3f2bffb4b18ff7eda2763a3b0ba316dd82e548f52ea8b4fd11c94b97afa7d 0.0s
=> CACHED [2/3] WORKDIR /usr/src/app 0.0s
=> CACHED [3/3] RUN pip install cryptography mlflow psycopg2-binary boto3 pymysql 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:76d4e4462b5c7c1826734e59a54488b56660de0dd5ecc188c308202608a8f20b 0.0s
=> => naming to docker.io/library/mlflow:Dockerfile 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 3/3
- Container db Created 0.5s
- Container MinIO Created 0.1s
- Container server Created 0.1s
Attaching to server, MinIO, db
db | 2021-10-06 12:12:57+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.26-1debian10 started.
db | 2021-10-06 12:12:57+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
db | 2021-10-06 12:12:57+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.26-1debian10 started.
db | 2021-10-06 12:12:57+00:00 [Note] [Entrypoint]: Initializing database files
db | 2021-10-06T12:12:57.679527Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.26) initializing of server in progress as process 44
db | 2021-10-06T12:12:57.687748Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
db | 2021-10-06T12:12:58.230036Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
db | 2021-10-06T12:12:59.888820Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
db | 2021-10-06T12:12:59.889102Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
db | 2021-10-06T12:12:59.997461Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
MinIO | Attempting encryption of all config, IAM users and policies on MinIO backend
MinIO | Endpoint:
MinIO |
MinIO | Browser Access:
MinIO |
MinIO |
MinIO | Object API (Amazon S3 compatible):
MinIO | Go: https://docs.min.io/docs/golang-client-quickstart-guide
MinIO | Java: https://docs.min.io/docs/java-client-quickstart-guide
MinIO | Python: https://docs.min.io/docs/python-client-quickstart-guide
MinIO | JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
MinIO | .NET: https://docs.min.io/docs/dotnet-client-quickstart-guide
server | 2021/10/06 12:13:02 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
server | (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'db' ([Errno 111] Connection refused)")
server | (Background on this error at: https://sqlalche.me/e/14/e3q8)
server | Operation will be retried in 0.1 seconds
server | 2021/10/06 12:13:02 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
server | (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'db' ([Errno 111] Connection refused)")
server | (Background on this error at: https://sqlalche.me/e/14/e3q8)
server | Operation will be retried in 0.3 seconds
server | 2021/10/06 12:13:02 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
server | (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'db' ([Errno 111] Connection refused)")
server | (Background on this error at: https://sqlalche.me/e/14/e3q8)
server | Operation will be retried in 0.7 seconds
server | 2021/10/06 12:13:03 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
server | (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'db' ([Errno 111] Connection refused)")
server | (Background on this error at: https://sqlalche.me/e/14/e3q8)
server | Operation will be retried in 1.5 seconds
db | 2021-10-06 12:13:04+00:00 [Note] [Entrypoint]: Database files initialized
db | 2021-10-06 12:13:04+00:00 [Note] [Entrypoint]: Starting temporary server
db | 2021-10-06T12:13:04.422603Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.26) starting as process 93
db | 2021-10-06T12:13:04.439806Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
db | 2021-10-06T12:13:04.575773Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
db | 2021-10-06T12:13:04.827307Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
db | 2021-10-06T12:13:04.827865Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
db | 2021-10-06T12:13:04.832827Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
db | 2021-10-06T12:13:04.834132Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
db | 2021-10-06T12:13:04.841629Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
db | 2021-10-06T12:13:04.855748Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /var/run/mysqld/mysqlx.sock
db | 2021-10-06T12:13:04.855801Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.26' socket: '/var/run/mysqld/mysqld.sock' port: 0 MySQL Community Server - GPL.
db | 2021-10-06 12:13:04+00:00 [Note] [Entrypoint]: Temporary server started.
server | 2021/10/06 12:13:05 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
server | (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'db' ([Errno 111] Connection refused)")
server | (Background on this error at: https://sqlalche.me/e/14/e3q8)
server | Operation will be retried in 3.1 seconds
db | Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
db | Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
db | Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.
db | Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it.
db | 2021-10-06 12:13:06+00:00 [Note] [Entrypoint]: Creating database DBMLFLOW
db | 2021-10-06 12:13:06+00:00 [Note] [Entrypoint]: Creating user MLFLOW
db | 2021-10-06 12:13:06+00:00 [Note] [Entrypoint]: Giving user MLFLOW access to schema DBMLFLOW
db |
db | 2021-10-06 12:13:06+00:00 [Note] [Entrypoint]: Stopping temporary server
db | 2021-10-06T12:13:06.948482Z 13 [System] [MY-013172] [Server] Received SHUTDOWN from user root. Shutting down mysqld (Version: 8.0.26).
server | 2021/10/06 12:13:08 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
server | (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'db' ([Errno 111] Connection refused)")
server | (Background on this error at: https://sqlalche.me/e/14/e3q8)
server | Operation will be retried in 6.3 seconds
db | 2021-10-06T12:13:08.716131Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.26) MySQL Community Server - GPL.
db | 2021-10-06 12:13:08+00:00 [Note] [Entrypoint]: Temporary server stopped
db |
db | 2021-10-06 12:13:08+00:00 [Note] [Entrypoint]: MySQL init process done. Ready for start up.
db |
db | 2021-10-06T12:13:09.159115Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.26) starting as process 1
db | 2021-10-06T12:13:09.167405Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
db | 2021-10-06T12:13:09.298925Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
db | 2021-10-06T12:13:09.488958Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
db | 2021-10-06T12:13:09.489087Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
db | 2021-10-06T12:13:09.489934Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
db | 2021-10-06T12:13:09.490169Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
db | 2021-10-06T12:13:09.494728Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
db | 2021-10-06T12:13:09.509856Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
db | 2021-10-06T12:13:09.509982Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.26' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
db | mbind: Operation not permitted
server | 2021/10/06 12:13:14 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
server | 2021/10/06 12:13:14 INFO mlflow.store.db.utils: Updating database tables
server | INFO [alembic.runtime.migration] Context impl MySQLImpl.
server | INFO [alembic.runtime.migration] Will assume non-transactional DDL.
server | INFO [alembic.runtime.migration] Running upgrade -> 451aebb31d03, add metric step
server | INFO [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
server | INFO [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
server | INFO [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
server | INFO [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
server | INFO [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
server | INFO [89d4b8295536_create_latest_metrics_table_py] Migration complete!
server | INFO [alembic.runtime.migration] Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
server | INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Adding registered_models and model_versions tables to database.
server | INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Migration complete!
server | INFO [alembic.runtime.migration] Running upgrade 2b4d017a5e9b -> cfd24bdc0731, Update run status constraint with killed
server | INFO [alembic.runtime.migration] Running upgrade cfd24bdc0731 -> 0a8213491aaa, drop_duplicate_killed_constraint
server | INFO [alembic.runtime.migration] Running upgrade 0a8213491aaa -> 728d730b5ebd, add registered model tags table
server | INFO [alembic.runtime.migration] Running upgrade 728d730b5ebd -> 27a6a02d2cf1, add model version tags table
server | INFO [alembic.runtime.migration] Running upgrade 27a6a02d2cf1 -> 84291f40a231, add run_link to model_version
server | INFO [alembic.runtime.migration] Running upgrade 84291f40a231 -> a8c4a736bde6, allow nulls for run_id
server | INFO [alembic.runtime.migration] Running upgrade a8c4a736bde6 -> 39d1c3be5f05, add_is_nan_constraint_for_metrics_tables_if_necessary
server | INFO [alembic.runtime.migration] Running upgrade 39d1c3be5f05 -> c48cb773bb87, reset_default_value_for_is_nan_in_metrics_table_for_mysql
server | INFO [alembic.runtime.migration] Context impl MySQLImpl.
server | INFO [alembic.runtime.migration] Will assume non-transactional DDL.
db | mbind: Operation not permitted
server | [2021-10-06 12:13:16 +0000] [17] [INFO] Starting gunicorn 20.1.0
server | [2021-10-06 12:13:16 +0000] [17] [INFO] Listening at: (17)
server | [2021-10-06 12:13:16 +0000] [17] [INFO] Using worker: sync
server | [2021-10-06 12:13:16 +0000] [19] [INFO] Booting worker with pid: 19
server | [2021-10-06 12:13:16 +0000] [20] [INFO] Booting worker with pid: 20
server | [2021-10-06 12:13:16 +0000] [21] [INFO] Booting worker with pid: 21
server | [2021-10-06 12:13:16 +0000] [22] [INFO] Booting worker with pid: 22
It makes me suspect because on the second line appears - mlflow Error
but I think that this is why the other builds haven't finished.
Then I've set my environment variables on the client to create the information flow between my script and the storages:
os.environ['MLFLOW_S3_ENDPOINT_URL'] = 'http://localhost:9000/'
os.environ['AWS_ACCESS_KEY_ID'] = 'key'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'pw'
remote_server_uri = "http://localhost:5000/" # server URI
finally I trained a TensorFlow network and I didn't have problems storing parameters and metrics but gave me some warnings (referring to next error). But the model haven't been auto log, so I tried to do it manually:
with mlflow.start_run(run_name = "test0") as run:
mlflow.keras.log_model(model2, 'model2')
It dosen't work and it gives me the next INFO (but essencialy an error):
INFO:tensorflow:Assets written to: (path)\Temp\tmpgr5eaha2\model\data\model\assets
INFO:tensorflow:Assets written to: (path)\Temp\tmpgr5eaha2\model\data\model\assets
2021/10/06 14:16:00 ERROR mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: (path)\AppData\Local\Temp\tmpgr5eaha2\model, flavor: keras)
Traceback (most recent call last):
File "(path)\Python\Python39\lib\site-packages\mlflow\utils\environment.py", line 212, in infer_pip_requirements
return _infer_requirements(model_uri, flavor)
File "(path)\Python\Python39\lib\site-packages\mlflow\utils\requirements_utils.py", line 263, in _infer_requirements
modules = _capture_imported_modules(model_uri, flavor)
File "(path)\Python\Python39\lib\site-packages\mlflow\utils\requirements_utils.py", line 221, in _capture_imported_modules
File "(path)\Python\Python39\lib\site-packages\mlflow\utils\requirements_utils.py", line 163, in _run_command
stderr = stderr.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 349: invalid continuation byte
And the next error:
ClientError Traceback (most recent call last)
~\Python\Python39\lib\site-packages\boto3\s3\transfer.py in upload_file(self, filename, bucket, key, callback, extra_args)
278 try:
--> 279 future.result()
280 # If a client error was raised, add the backwards compatibility layer
~\Python\Python39\lib\site-packages\s3transfer\futures.py in result(self)
105 # out of this and propogate the exception.
--> 106 return self._coordinator.result()
107 except KeyboardInterrupt as e:
~\Python\Python39\lib\site-packages\s3transfer\futures.py in result(self)
264 if self._exception:
--> 265 raise self._exception
266 return self._result
~\Python\Python39\lib\site-packages\s3transfer\tasks.py in __call__(self)
125 if not self._transfer_coordinator.done():
--> 126 return self._execute_main(kwargs)
127 except Exception as e:
~\Python\Python39\lib\site-packages\s3transfer\tasks.py in _execute_main(self, kwargs)
--> 150 return_value = self._main(**kwargs)
151 # If the task is the final task, then set the TransferFuture's
~\Python\Python39\lib\site-packages\s3transfer\upload.py in _main(self, client, fileobj, bucket, key, extra_args)
693 with fileobj as body:
--> 694 client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
~\Python\Python39\lib\site-packages\botocore\client.py in _api_call(self, *args, **kwargs)
385 # The "self" in this scope is referring to the BaseClient.
--> 386 return self._make_api_call(operation_name, kwargs)
~\Python\Python39\lib\site-packages\botocore\client.py in _make_api_call(self, operation_name, api_params)
704 error_class = self.exceptions.from_code(error_code)
--> 705 raise error_class(parsed_response, operation_name)
706 else:
ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The Access Key Id you provided does not exist in our records.
During handling of the above exception, another exception occurred:
S3UploadFailedError Traceback (most recent call last)
C:\Users\FCAIZA~1\AppData\Local\Temp/ipykernel_7164/2476247499.py in <module>
1 with mlflow.start_run(run_name = "test0") as run:
----> 3 mlflow.keras.log_model(model2, 'model2')
5 mlflow.end_run()
~\Python\Python39\lib\site-packages\mlflow\keras.py in log_model(keras_model, artifact_path, conda_env, custom_objects, keras_module, registered_model_name, signature, input_example, await_registration_for, pip_requirements, extra_pip_requirements, **kwargs)
402 mlflow.keras.log_model(keras_model, "models")
403 """
--> 404 Model.log(
405 artifact_path=artifact_path,
406 flavor=mlflow.keras,
~\Python\Python39\lib\site-packages\mlflow\models\model.py in log(cls, artifact_path, flavor, registered_model_name, await_registration_for, **kwargs)
186 mlflow_model = cls(artifact_path=artifact_path, run_id=run_id)
187 flavor.save_model(path=local_path, mlflow_model=mlflow_model, **kwargs)
--> 188 mlflow.tracking.fluent.log_artifacts(local_path, artifact_path)
189 try:
190 mlflow.tracking.fluent._record_logged_model(mlflow_model)
~\Python\Python39\lib\site-packages\mlflow\tracking\fluent.py in log_artifacts(local_dir, artifact_path)
582 """
583 run_id = _get_or_start_run().info.run_id
--> 584 MlflowClient().log_artifacts(run_id, local_dir, artifact_path)
~\Python\Python39\lib\site-packages\mlflow\tracking\client.py in log_artifacts(self, run_id, local_dir, artifact_path)
975 is_dir: True
976 """
--> 977 self._tracking_client.log_artifacts(run_id, local_dir, artifact_path)
979 @contextlib.contextmanager
~\Python\Python39\lib\site-packages\mlflow\tracking\_tracking_service\client.py in log_artifacts(self, run_id, local_dir, artifact_path)
332 :param artifact_path: If provided, the directory in ``artifact_uri`` to write to.
333 """
--> 334 self._get_artifact_repo(run_id).log_artifacts(local_dir, artifact_path)
336 def list_artifacts(self, run_id, path=None):
~\Python\Python39\lib\site-packages\mlflow\store\artifact\s3_artifact_repo.py in log_artifacts(self, local_dir, artifact_path)
102 upload_path = posixpath.join(dest_path, rel_path)
103 for f in filenames:
--> 104 self._upload_file(
105 s3_client=s3_client,
106 local_file=os.path.join(root, f),
~\Python\Python39\lib\site-packages\mlflow\store\artifact\s3_artifact_repo.py in _upload_file(self, s3_client, local_file, bucket, key)
78 if environ_extra_args is not None:
79 extra_args.update(environ_extra_args)
---> 80 s3_client.upload_file(Filename=local_file, Bucket=bucket, Key=key, ExtraArgs=extra_args)
82 def log_artifact(self, local_file, artifact_path=None):
~\Python\Python39\lib\site-packages\boto3\s3\inject.py in upload_file(self, Filename, Bucket, Key, ExtraArgs, Callback, Config)
128 """
129 with S3Transfer(self, Config) as transfer:
--> 130 return transfer.upload_file(
131 filename=Filename, bucket=Bucket, key=Key,
132 extra_args=ExtraArgs, callback=Callback)
~\Python\Python39\lib\site-packages\boto3\s3\transfer.py in upload_file(self, filename, bucket, key, callback, extra_args)
283 # client error.
284 except ClientError as e:
--> 285 raise S3UploadFailedError(
286 "Failed to upload %s to %s: %s" % (
287 filename, '/'.join([bucket, key]), e))
S3UploadFailedError: Failed to upload (path)\AppData\Local\Temp\tmpgr5eaha2\model\conda.yaml to artifacts/1/5ae5fcef2d07432d811c3d7eb534382c/artifacts/model2/conda.yaml: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The Access Key Id you provided does not exist in our records.
I found the solution of this issue. It is a tricky problem due to spanish characters, my system's user profile in "C:/" is "fcañizares" (Cañizares is my first last name). I have created another user named "fcanizares" and all is working fine. Hope you find this solution helpfull.
PS: Moral of the issue, get rid of the extrange characters!