Search code examples
pythoncsvpydantic

How to create Pydantic model based on string (s3 access logs) data


The issue is I need to parse a string line delimited by spaces (but not exactly) into a Pydantic model. The field names are known and types for this task are not important, so keeping str is fine.

I'm unfamiliar with Pydantic, but I assume there is a way to leverage the BaseModel.model_validate method or similar to make the parsing more natively.

For the sake of example I trimmed the log and model!

Example log file:

79a59df900b949e55d DOC-EXAMPLE-BUCKET1 [06/Feb/2019:00:00:38 +0000]

The model:

class S3AccessLogEntry(BaseModel):
    owner: str
    bucket: str
    timestamp: str

Solution

  • There are no pydantic mechanisms to do what you want. Actually, I'm not sure that you even need pydantic model here. But if having pydantic model is important you can do it in the following way:

    class S3AccessLogEntry(BaseModel):
        owner: str
        bucket: str
        timestamp: str
    
        @classmethod
        def from_log(cls, line: str) -> "S3AccessLogEntry":
            """Build model based on log line."""
    
            try:
                owner, bucket, timestamp = line.split(sep=" ", maxsplit=2)
            except ValueError as exc:
                raise ExceptionYouWant from exc
    
            return cls(
                owner=owner,
                bucket=bucket,
                timestamp=timestamp
            )
    
    S3AccessLogEntry.from_log(line="79a59df900b949e55d DOC-EXAMPLE-BUCKET1 [06/Feb/2019:00:00:38 +0000]")
    

    Or if this is csv-like format you can do something like this:

    csv_logs = csv.DictReader(csv_data.splitlines())
    
    for line in csv_logs:
        log_entry = S3AccessLogEntry.model_validate(obj=line)