Search code examples
pentahokettle

Pentaho - Migrating from Database repository to File Repository


I am in process of migrating Pentaho from Database repository to file repository. I have exported the database repository into xml file and then created a file repository and imported the repository... The first issue that I saw after importing is all my database connections are being stored in .ktr and .kjb files, This is going to be a big issue If I update a connection string like updating a password, I have more than a hundreds of sub transformations and jobs, do I have to update this in all those files? Is there any way to ignore the password and other connection settings that is stored in the .ktr and .kjb files and instead use the repository connection or specify it in the .kettle property?

The other issue that I face is When I try to run the master job via kitchen in cmd it does not recognize the sub transformation and jobs. However when I change the Transformation root to ${Internal.Entry.Current.Directory} - the sub transformation is being recognized and processed- As I mentioned I have more than 100 sub transformation and jobs - is there any way to update this root for all jobs and transformation at once.

Kitchen.bat /file:"C:\pentaho-8-1\Dev_Repo\home\jobs\MainProcess\MasterJob.kjb" /level:Basic /logfile:"C:\pentaho-8-1\logs\my-job.txt"

This fails with error (.ktr is not a file or the repository is not defined) withimportedroot

However when I change the root directory to ${Internal.Entry.Current.Directory} it works!

withmodifiedroot


Solution

  • For the database connections, you can make .kdbs in the repository and enter variables for all the properties (Host, Port, Schema, User, etc) and define them in kettle.properties or another properties file.

    This works like a more convenient version of JNDI files, with one properties file per environment. You can easily inspect current values by opening the kettle properties from within the Spoon client (don't edit them or it will mess up the layout!) and you can also put kettle "encrypted" passwords in the properties file.

    PDI will still save copies of the connections into all the .kjb and ktr files (and should in theory update them from .kdb or shared.xml when opening them) but since the contents are just generic variable names (${STAGING_DB_HOST} etc) you will almost never run into problems with this.

    For the transformation filenames, a good text search and replace tool should fix most of your transformations in one go. Include some of the XML tag to prevent replacing too much.