I have a more general requirement to track changes in asset files that are committed into source code and deployed inside the binaries, but for now I am implementing it in unit testing context and facing a potential problem for the future. Before asking the TLDR question I will show a lot of contextual information.
Scenario
Some application assets are loaded from CSV files committed into Git repository via ClasspathResource
[1] and they may sometime change. Change occurs across commits, but for a runtime application the change occurs across different versions of the application.
My test solution
I have implemented the following mechanism to alert me about changes in the resource:
@Before
public void setUp() throws Exception
{
assertEquals("Resource file has changed. Make sure the test reflects the changes in the file and update the checksum", MD5_OF_FILE,
DigestUtils.md5Hex(new ClassPathResource("META-INF/resources/assets.csv").getInputStream()));
}
Basically, I want my unit tests to fail until I explicitly code the checksum of the file. When I run md5sum assets.txt
I hardcode the result into the code so tests know they are working with a fixed version of the file.
Problem
I ran the tests on my own Windows box and worked like a charm. Switching to Linux, I found that they failed. Immediately I realized that it may be due to line endings, which I totally forgot.
In the specific case, Git is configured to commit files LF
but checkout (in Windows) CRLF
. This configuration is reasonable for working with source code.
So I need to check if the asset file has changed in a smart way that allows a box to change/reinterpret the line endings. This is especially true for the runtime application which will store the file hash and will compare the actual assets file (which may have changed), performing corrective actions on differences ==> reloading the assets.
Given a textual file of which I can extract and store any hash (not just cryptographic, I used MD5), how can I tell that it has changed regardless of the environment the file is processed into, which may modify the line endings?
Note I have requirement not to use a versioning system in the asset itself (e.g. first row has incremental version, since developers will fail to update correctly).
[1] Spring framework tool wrapping Class.getResourceAsStream
A solution could be normalizing the file to chosen line endings, i.e. always CRLF
or always LF
, then compute the cryptographic hash over that normalized content.
E.g. compute md5sum | dos2unix file
and use a proper Stream
in code that normalizes the file on the fly