Search code examples
dockerredislabel-studio

How to properly configure label-studio to use redis as a data source (in a docker)


I'm trying to set up a docker stack for a datascience project and I want to use redis to have services exchange data.

I followed the documentation provided by label studio but there are a lot of details missing and my implementation doesn't work.

Specifically : label studio is able to register redis as a data source but not as a data target, and as a source it doesn't retrieve my tasks data.

what I tried

My Docker compose file

I removed any service unrelated to label-studio, and there is a .env file for the variables.
The postgres part works fine but I kept it in the example because its part of redis config.

services:
  postgres:
    image: postgres:16-alpine
    container_name: postgres
    ports:
      - ${POSTGRES_PORT}:5432
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - PGDATA=/var/lib/postgresql/data/pgdata
      - POSTGRES_PORT=${POSTGRES_PORT}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready", "-d", "postgres"]
      interval: 10s
      timeout: 10s
      retries: 120
    volumes:
      - pgdata:/var/lib/postgresql/data:Z

  redis:
    image: redis:5-alpine
    container_name: redis
    ports:
      - 6379:6379
    volumes:
      - redisdata:/data
    healthcheck:
      test: [ "CMD", "redis-cli", "--raw", "incr", "ping" ]
      interval: 10s
      timeout: 10s
      retries: 120
    command: [ "redis-server",
               "--save", "60", "1",
               "--loglevel", "debug",
               "--requirepass", "${REDIS_PASSWORD}"]

  label-studio:
    image: heartexlabs/label-studio:latest
    container_name: label-studio
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    ports:
      - 8081:8080
    environment:
      - DJANGO_DB=default
      - POSTGRE_HOST=postgres
      - POSTGRE_PORT=${POSTGRE_PORT}
      - POSTGRE_NAME=${POSTGRE_NAME}
      - POSTGRE_USER=${POSTGRE_USER}
      - POSTGRE_PASSWORD=${POSTGRE_PASSWORD}
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - REDIS_LOCATION=redis:6379
      - REDIS_DB=0
      - REDIS_PASSWORD=${REDIS_PASSWORD}
    volumes:
      - lsdata:/label-studio/data
    command: ["label-studio",
              "--log-level", "DEBUG"]

volumes:
  pgdata:
    driver: local
  redisdata:
    driver: local
  lsdata:
    driver: local

Redis

Redis runs, and has tasks data, I tested the following formats

labelstudio:ls-task-1 '{"text":"some text"}'
labelstudio:ls-task-2 '{"id":0, "data": {"texte": "some text"}} 


ls-task-1 '{"text":"some text"}'
ls-task-2 '{"id":0, "data": {"texte": "some text"}} 

Label Studio

labeling interface config

<View>
  <Text name="text" value="$text"/>
  <View style="box-shadow: 2px 2px 5px #999; padding: 20px; margin-top: 2em; border-radius: 5px;">
    <Header value="Some themes"/>
    <Choices name="theme" toName="text" choice="multiple" showInLine="true">
      <Choice value="somevalue">Some Choice</Choice>
      <Choice value="othervalue">Other Choice/Choice>
    </Choices>
  </View>
</View>
<!-- {
  "data": {"text": "Some Text"}
} -->

Cloud Storage config

Storage Type : Redis
Path : labelstudio
Password :
Host : redis
port : 6379

How it fails

As a data source

I see in the logs label studio connecting to redis but it always shows 0 tasks

label studio logs

[2024-07-11 08:46:24,606] [urllib3.connectionpool::_make_request::474] [DEBUG] https://o227124.ingest.sentry.io:443 "POST /api/5820521/envelope/ HTTP/1.1" 200 2
[2024-07-11 08:46:43,954] [io_storages.base_models::sync::454] [INFO] Start syncing storage RedisImportStorage object (1)
[2024-07-11 08:46:43,964] [projects.models::_update_tasks_states::422] [INFO] Starting _update_tasks_states with params: Project risque-juridique (id=1) maximum_annotations 1 and percentage 100
[2024-07-11 08:46:43,971] [urllib3.connectionpool::_new_conn::1019] [DEBUG] Starting new HTTPS connection (1): tele.labelstud.io:443
[2024-07-11 08:46:43,971] [django.server::log_message::161] [INFO] "POST /api/storages/redis/1/sync HTTP/1.1" 200 618
[2024-07-11 08:46:43,971] [django.server::log_message::161] [INFO] "POST /api/storages/redis/1/sync HTTP/1.1" 200 618
[2024-07-11 08:46:44,454] [urllib3.connectionpool::_make_request::474] [DEBUG] https://tele.labelstud.io:443 "POST / HTTP/1.1" 200 0
[2024-07-11 08:47:24,609] [urllib3.connectionpool::_make_request::474] [DEBUG] https://o227124.ingest.sentry.io:443 "POST /api/5820521/envelope/ HTTP/1.1" 200 2

redis logs

11 Jul 2024 08:46:43.953 - Accepted 192.168.48.6:42558
11 Jul 2024 08:46:43.954 - Client closed connection
11 Jul 2024 08:46:43.963 - Accepted 192.168.48.6:42564
11 Jul 2024 08:46:43.964 - Client closed connection
11 Jul 2024 08:46:45.526 - Accepted 127.0.0.1:57774
11 Jul 2024 08:46:45.526 - Client closed connection

As a data target

Label studio shows

Runtime error
Validation error

    validate_connection is not implemented

Version: 1.12.1
Label studio logs
[2024-07-11 08:51:29,742] [core.utils.common::custom_exception_handler::89] [ERROR] c9a7909a-c865-4f8a-813b-3a4e7918d9a5 [ErrorDetail(string='validate_connection is not implemented', code='invalid')]
Traceback (most recent call last):
  File "/label-studio/label_studio/io_storages/api.py", line 82, in perform_create
    instance.validate_connection()
  File "/label-studio/label_studio/io_storages/base_models.py", line 218, in validate_connection
    raise NotImplementedError('validate_connection is not implemented')
NotImplementedError: validate_connection is not implemented

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/django/utils/decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rest_framework/generics.py", line 242, in post
    return self.create(request, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rest_framework/mixins.py", line 19, in create
    self.perform_create(serializer)
  File "/label-studio/label_studio/io_storages/api.py", line 84, in perform_create
    raise ValidationError(exc)
rest_framework.exceptions.ValidationError: [ErrorDetail(string='validate_connection is not implemented', code='invalid')]
[2024-07-11 08:51:29,748] [django.request::log_response::224] [WARNING] Bad Request: /api/storages/export/redis
[2024-07-11 08:51:29,748] [django.request::log_response::224] [WARNING] Bad Request: /api/storages/export/redis
[2024-07-11 08:51:29,748] [urllib3.connectionpool::_new_conn::1019] [DEBUG] Starting new HTTPS connection (1): tele.labelstud.io:443
[2024-07-11 08:51:29,749] [django.server::log_message::161] [WARNING] "POST /api/storages/export/redis?project=1 HTTP/1.1" 400 210
[2024-07-11 08:51:29,749] [django.server::log_message::161] [WARNING] "POST /api/storages/export/redis?project=1 HTTP/1.1" 400 210

Solution

  • I'm gonna answer my own question, long story short, it was indeed a bug, not only was the validation function for redis output not written but the redis input link into label studio wasn't properly implemented either. The Label Studio team recently added to the redis connection config the missing parameters (redis database id for one) now allowing retrieval of tasks from redis.