Sitecore scaled environment loses session
Last week I came across a tricky issue which took me several days to fix.
We had created a login functionality where a user would be logged in against a 3rd party service, after which we created a Virtual User in Sitecore to authenticate the user with the Sitecore platform.
Everything worked fine (users were created and roles were properly assigned), until we reached an environment which was scaled and had multiple Content Delivery servers. When a logged in user would hit a different CD then which it authenticated against, it would lose its session.
The full setup is built using Sitecore Headless (.NET SDK version) and hosted in Kubernetes. Configuring sticky session on the CDs was therefor not a straightforward ‘fix’ for this issue as it was the .NET frontend application performing the API requests to the CDs instead of the end-users’ browser itself.
During our investigations we noticed the following symptoms:
- The user did not get the “.ASPXAUTH” cookie as we were used to on older projects, even though Sitecore was configured to use Forms authentication
- On login the user would get the “ASP.NET_SessionId” cookie
- User data was successfully created and pushed to the ClientData table in the Web database upon login
- No errors were showing in the logs, only the fact that the Layout Service returned a 401 status code on different CDs than the one used to login
- Shared session state was configured correctly
The first thing I tried to do was to make sure the .ASPXAUTH cookie was created on login.
This cookie should have been created when you use the Forms authentication provider as configured in the web.config.
After a lot of digging, it turned out that the authentication provider was being overwritten through Dependency Injection when Owin authentication was enabled. Owin authentication is currently enabled by default, also on Content Delivery roles.
Following the steps to disable Sitecore Identity (https://doc.sitecore.com/xp/en/developers/102/sitecore-experience-manager/understanding-sitecore-authentication-behavior-changes.html#disable-sitecore-identity) made Dependency Injection default back to the Forms authentication provider. I disabled Owin/Sitecore Identity specifically for Content Delivery roles as we still needed to use it for authentication against the CM.
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:set="http://www.sitecore.net/xmlconfig/set/"> <sitecore role:require="ContentDelivery"> <settings> <setting name="Owin.Authentication.Enabled" set:value="false" /> <setting name="FederatedAuthentication.Enabled" set:value="false" /> </settings> <sites> <site name="shell" set:loginPage="/sitecore/login" /> <site name="admin" set:loginPage="/sitecore/admin/login.aspx" /> </sites> </sitecore> </configuration>
Disabling Owin eventually let to the .ASPXAUTH cookie being created on login, but not fixing the loss of session issue.
To fully fix the issue, I had to specify a machine key in the web.config. This key is used to decrypt the .ASPXAUTH cookie on HTTP Requests. If this key is different from the one used to generate the cookie, then the cookie can’t be decrypted and read.
As I was using a scaled environment in Kubernetes, my Content Delivery containers did not share the same machine key. By generating machine key variables in my local IIS and specifying them in the web.config transform file, I was able to get everything to work.
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform"> <system.web> <machineKey xdt:Transform="InsertIfMissing" validation="SHA1" decryption="AES" decryptionKey="69C6D89A7639B35ADBF568F0DD53A9AD808A081120D842A7" validationKey="36C0D6B8D1607DC3A029456FFF53F6A570C7527EE6052A7561065AF4D032F111B2E4050F2A4EC32079A34A3258ACB9ED60812EF2FE136027AC917D46EE45F514" /> </system.web> </configuration>
I hope this will help others who face this same issue, and if not I at least know where to find the answer next time!
Note: I am not 100% sure if applying the machine key only would also have fixed the issue. This is something I still need to try but time has not allowed me to do so yet.