ARM template broken due to breaking change in ServiceFabricNodeBootstrapAgent
We had some problems earlier today with Service Fabric ARM templates that just seemed to have stopped working. We use these templates to spin up on-demand test environments.
The problem appears to have originated in a breaking change in the
ServiceFabricNodeBootstrapAgent. Our ARM script to setup a Service Fabric cluster used to work fine with agent version 126.96.36.199. However, if we do a new deployment now, we get version 188.8.131.52 of the agent which logs the following error in the event log:
Failed starting service, Error: System.ArgumentNullException: Value cannot be null.
I looked at the code using DotPeek and found a breaking change concerning the behaviour around the
dataRoot setting (which can be used to customize the location of the SvcFab directory).
In version 184.108.40.206 of the
ServiceFabricNodeBootstrapAgent, the code checks whether the
dataRoot field is not null:
However, in version 220.127.116.11, this check is gone:
num = this.ConfigureNode(str1, str2, Path.GetFullPath(dataRoot), Path.GetFullPath(Path.Combine(dataRoot, "Log")));
In this version, dataRoot is directly passed to the
Path.GetFullPath method which then throws an
ArgumentNullException if the
dataRoot field is not set.
So where should this
dataRoot value come from? It is loaded from the
current.config file that’s in the same directory as the bootstrapper service. The
current.config file will only contain the value however if it has been explicitly set in the ARM template (
dataPath setting of the
So if your Service Fabric ARM deployments are suddenly not working anymore, check if you’ve specified the
I’ve opened an issue for this at https://github.com/Azure/service-fabric-issues/issues/197.