Search code examples
google-cloud-platformgoogle-cloud-storageversioninglifecycle

GCP Storage versioning: Deleting files with more than two revisions


I am trying with Object Lifecycle management on GCP Storage buckets. I want to delete all files which are having 2 newer versions of the object in the bucket. For this I created the following rule

 {
  "lifecycle": {
    "rule": [
    {
       "action": {"type": "Delete"},
       "condition": {
       "numNewerVersions": 2
    }
   }
 ] 
 }
 }

Versioning is enabled on the bucket as verified below

     $ gsutil versioning get gs://<my bucket>/
     gs://<my bucket>/: Enabled  

The inputFIle.txt has three revisions as shown below inside the bucket .

gsutil ls -a gs://bucketdataflowtest/
gs://<my bucket>//inputFile.txt#1597038772164786
gs://<my bucket>//inputFile.txt#1600169465982831
gs://<my bucket>//inputFile.txt#1600680502763401
gs://<my bucket>//jsonSchema.json#1597038769578689
gs://<my bucket>//transformCSVtoJSON.js#1597038773640155

After applying the rules, I validated the rules has been applied

  $ gsutil lifecycle  get  gs://<my bucket>/
  {"rule": [{"action": {"type": "Delete"}, "condition": {"numNewerVersions": 2}}]}

But I don't see older revisions getting deleted . As shown below 4 revision of inputFIle.txt still exists including live version.

      $ gsutil ls -la gs://<my bucket>/
   209  2020-08-10T05:52:52Z  gs://<my bucket>/inputFile.txt#1597038772164786  metageneration=1
   347  2020-09-15T11:31:05Z  gs://<my bucket>/inputFile.txt#1600169465982831  metageneration=1
   347  2020-09-21T09:28:22Z  gs://<my bucket>/inputFile.txt#1600680502763401  metageneration=1
   347  2020-09-21T09:48:00Z  gs://<my bucket>/inputFile.txt#1600681680007546  metageneration=1
   571  2020-08-10T05:52:49Z  gs://<my bucket>/jsonSchema.json#1597038769578689  metageneration=1
   495  2020-08-10T05:52:53Z  gs://<my bucket>/transformCSVtoJSON.js#1597038773640155  metageneration=1

Anything wrong with my lifecycle rules. The bucket is having STANDARD storage class.

UPDATE:

As clarified by @guillaume blaquiere , I will wait 24 hours to check if the policy has taken into impact.

I need additional clarification on below scenario.

I have written lifecylce rule to delete all Live objects if they are older than 60 days and non current versions to be deleted after 70 days. So when the rule is applied all live versions objects older than 60 days will be deleted . But since object versioning is enabled on bucket it will create non current version for the objects which are deleted . Now because of the second condition in the same rule when these non current versions will be deleted . Will it be deleted after 70 days or it will be deleted after additional 10 days . Hope I am clear. The rules for the above scenario is as shown below

   {
    "lifecycle": {
     "rule": [
        {
           "action": {"type": "Delete"},
            "condition": {
            "age": 60,
            "isLive": true
        }
      },

    {
      "action": {"type": "Delete"},
        "condition": {
          "age": 70 ,
          "isLive": false
     }
    }
  ]
  }
  }

Kindly clarify


Solution

  • The lifecycle is an async process and your configuration can take up to 24h to be taken into account. You don't know when it scans your bucket. Here the detail in the documentation

    Cloud Storage regularly inspects all the objects in a bucket for which Object Lifecycle Management is configured and performs all actions applicable according to the bucket's rules. Cloud Storage performs an action asynchronously, so there can be a lag between when the conditions are satisfied and when the action is taken.

    Updates to your lifecycle configuration may take up to 24 hours to go into effect. This means that when you change your lifecycle configuration, Object Lifecycle Management may still perform actions based on the old configuration for up to 24 hours.