How to Fix Certificate Issues in Sitecore Next.js Apps on Windows Containers in AKS

How to Fix Certificate Issues in Sitecore Next.js Apps on Windows Containers in AKS

👋Introduction

I recently worked on a project that used Sitecore 10.3 XM (Sitecore Experience Edge for content delivery), the Sitecore JavaScript Rendering SDK (JSS) for Next.js front-end build, and the Sitecore Headless SXA (Sitecore Experience Accelerator) for setting up the content tree. As you are aware, Next.js applications can be hosted using various services, depending on the prerendering form and your preference.

🤼‍♂️The Challenge

The problem started when we were deploying our Sitecore Next.js application on Windows Pod (Windows Server 2022 node pool with Node.js Server) in Azure Kubernetes Services, and the pod was running but the Node.js service was not running property inside the pod, and Rendering Host (RH) pod showing the blank screen and Hypertext Transfer Protocol (HTTP) status code showing 502 bad gateway error.

🕵️The Investigation

We looked through the service logs first in an attempt to identify the source of the issue. Service logs are a treasure trove of data and frequently reveal what's wrong with the system.

Sitecore Rendering Host (Front-end app Service)

In our front-end instance, the problem was occurring during the web application's build time, according to the Sitecore Rendering Host Service logs, which were confirmed by the following logs: 🔝

Registering generate-component-builder plugin component-builder
Registering generate-component-builder plugin components
Writing generate-component-builder plugins to C:\app\scripts\temp\generate-component-builder-plugins.ts
.
.
Registering config plugin scjssconfig
Registering config plugin sxa
.
.
.
Writing component builder to C:\app\src\temp\componentBuilder.ts
Fetching site information from https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge
> sitecore-jss-app@21.1.7 next:dev
> cross-env NODE_OPTIONS='--inspect' next dev


> sitecore-jss-app@21.1.7 start:watch-components
> ts-node --project tsconfig.scripts.json scripts/generate-component-builder/index.ts --watch
.
.
.
Watching for changes to component builder sources in src/components...
Starting inspector on 127.0.0.1:9229 failed: address already in use
Starting inspector on 127.0.0.1:9229 failed: address already in use
(node:15212) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
.
.
.
.
Fetching site information from https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge
##[error] sitecore-jss:multisite request: { url: 'https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge', headers...}

sitecore-jss:multisite response error: 'request to https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge failed, reason: unable to verify the first certificate'
Error fetching site information
FetchError: request to https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge failed, reason: unable to verify the first certificate
    at ClientRequest.<anonymous> (C:\Agent2\_work\28\s\src\UI\nxtjs\node_modules\node-fetch\lib\index.js:)
    at ClientRequest.emit (node:events:513:28)
    at ClientRequest.emit (node:domain:489:12)
    at TLSSocket.socketErrorListener (node:_http_client:494:9)
    at TLSSocket.emit (node:events:513:28)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  type: 'system',
  errno: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE',
  code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE'
.
.
.
.
sitecore-jss:multisite response error: 'request to https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge failed, reason: write EPROTO F03C0000:error:0A000458:SSL routines:ssl3_read_bytes:tlsv1 unrecognized name:c:\\ws\\deps\\openssl\\openssl\\ssl\\record\\rec_layer_s3.c:1586:SSL alert number 112\n'
at ClientRequest.<anonymous> (C:\app\node_modules\node-fetch\lib\index.js:1505:11)

    at ClientRequest.<anonymous> (C:\app\node_modules\node-fetch\lib\index.js:1505:11)
    at ClientRequest.emit (node:events:300:28)
    at ClientRequest.emit (node:domain:477:12)
    at TLSSocket.socketErrorListener (node:_http_client:400:9)
    at TLSSocket.emit (node:events:517:28)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'EPROTO',
  code: 'EPROTO'
}

If you look into the above logs, there were three types of issues:

  1. Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.

  2. unable to verify the first certificate

  3. write EPROTO

Sitecore Next.js Code Base

We verified following details in the front-end code base

  1. Sitecore API Endpoint (SITECORE_API_HOST)

  2. Front-end Website Public Facing URL (PUBLIC_URL)

  3. Sitecore API Key (SITECORE_API_KEY)

  4. Edge GraphQL Endpoint (GRAPH_QL_ENDPOINT) - used when you accessing data from Sitecore Experience Edge (XE)

Sitecore CMS GraphQL playground

We verified the Edge GraphQL (GQL) queries at preview IDE URL as https://[WEBSITE HOST ADDRESS]/sitecore/api/graph/edge/ui and preview API URL as https://[WEBSITE HOST ADDRESS]/sitecore/api/graph/edge, and need to replace [WEBSITE HOST ADDRESS] with your Sitecore CM instance host address, and it were working fine.

Verify Edge GraphQL Queries using POSTMAN

We verified the Edge GraphQL (GQL) queries against the Sitecore CM endpoint using the POSTMAN client. However, it was not working, and we received the error unable to verify the first certificate.

To resolve this, we disabled the SSL certificate setting in the POSTMAN client and were able to connect with the CM instance (https://enlightenwithamit.hashnode.dev). However, the Edge GQL (https://enlightenwithamit.hashnode.dev/sitecore/api/graph/edge?sc_apikey=xxxxxx) endpoint returned NULL for the Home page layout details for the following query: 🔝

query {
  layout(site: "my-jss-app", routePath: "/", language: "en") {
    item {
      rendered
    }
  }
}

Node.js Version

We updated the Node.js version, and the build was successful. However, we started getting the error unable to verify the first certificate in the front-end service logs.

Agent PC

On the Agent PC (Windows machine in the build pipeline), we demonstrated the behaviour to the Sitecore Support Team and performed the following steps:

  • Explained the issue details to the Sitecore Support Team

  • Re-ran the PowerShell script in the DevOps pipeline to see the error

  • Verified the path and extension of the certificate (such as PEM, CER, etc.)

  • Checked the path of the certificate in the Next.js front-end app’s package.json file

  • Reviewed the environment file to check the certificate path

  • Verified the Agent system's trusted certificate store to check the ROOT path of the certificate

  • Downloaded the certificate PEM file and rebuilt the front-end app, but encountered the same issue

  • The Sitecore support team referred us to the knowledge base article Troubleshooting JSS Next.js apps. We verified the details from configure Sitecore CA certificates for Node.js and updated the PowerShell script and package.json file, but we still did not succeed 🔝

  • We downloaded the certificate from Azure Key Vault and imported it onto the Agent machine:

  • We ran the PowerShell scripts from the front-end build pipeline step and faced the same issue

Application Gateway

For this SSL issue, we verified the following items at application gateway:

  • DNS resolution and application gateway IP addresses

  • HTTP listener configuration on port 443

  • Bundled the certificate and verified with AKS .crt

  • NSG configuration for inbound rules for app gateway subnet

  • Executed following commands after installation of OpenSSL and Curl on Agent PC (Windows machine in the build pipeline)

      # 1 Command
      openssl s_client -connect [WEBSITE HOST ADDRESS]:443 -servername [WEBSITE HOST ADDRESS] -showcerts
    
      # 2 Command
      curl -vI https://[WEBSITE HOST ADDRESS]
    
      # 3 Command
      ping [WEBSITE HOST ADDRESS]
    

    Need to replace [WEBSITE HOST ADDRESS] with your Sitecore CM instance host address.

    After running the above commands, we got an error like this:

      depth=0 C = US, ST = California, O = Amit Kumar, CN = enlightenwithamit.hashnode.dev
      verify error:num=20:unable to get local issuer certificate
    

    We identified that the error above indicates the certificate does not have the correct chain. As a result, we receive the error message "unable to verify the first certificate" when accessing the site.

⚡The Solution

After a thorough analysis, we found that the problem was with the certificate chain, not the code.

The Current certificate is in below format:

Server Cert >> Some other Cert >> Root Cert >> Intermediate cert

The correct order should be:

Server Cert >> Intermediate Cert >> Root Certificate

To temporarily fix the issue, we followed these steps:

  • Exported the installed certificate from the Agent machine by selecting all the options below and using a password in the .PFX extension:

  • Convert the .PFX file into a .PEM file with the following command:

      openssl pkcs12 -in C:\\Users\\MyProjExportedCert.pfx -out MyProjCert.pem -nodes
    
  • Uploaded the MyProjCert.pem file into the front-end code repository's certificate folder, like "certificate/MyProjCert.pem".

    💡
    We should not store the certificate in the code repository. Instead, we need to pass the certificate details dynamically using Key Vault or another secure method.
  • After making these changes, we ran the PowerShell script, and it worked without showing the unable to verify the first certificate error. 🔝

  • To fix the issue with the POSTMAN client, we attached the .PEM certificate to the POSTMAN client. It started working and no longer showed the unable to verify the first certificate error.

  • After making these changes, we ran the PowerShell script, and it worked without showing the unable to verify the first certificate error. 🔝

The permanent fix for this issue should be:

  • Obtain a correct certificate which will have a correct chain order as :

    Server Cert >> Intermediate Cert >> Root Certificate

    💡
    Storing the certificate in an AKS secret (.crt) is not ideal. It is better to store certificates in a Key Vault and then specify the Key Vault in your Kubernetes Ingress YAML file. You can check more details here.
    💡
    The AKS secret (.crt) can contain the entire certificate chain, including all certificates (CA/Server/Client/Intermediate) except the private key.

    After obtaining the certificate with the full chain, we updated it in AKS. It started working in Postman, and the website also began functioning without any errors.

🙏Credit/References

🏓Pingback

How to resolve error 500 on Azure web app?How do I fix Microsoft 500 error?How do I fix Error 500 on my website?
Web App Service returns 500 Internal Server Error whenAddressing 500 Server Error on Web AppInternal Server Error 500 when deploying web app
certificate issues in sitecore next.js appssitecore-jsscore-site.xml not found
a sitecore.javascriptservices application was not found for the pathsite does not have a certificatethis site does not have a certificate microsoft edge
this site's certificate is not trustedsitecore issuessitecore/service/notfound.aspx
edge site is not securesitecore the certificate was not foundis sitecore free 🔝
new cert not showing in iisvercel sitecorewebsite certificate issue
website certificate is not validsitecore xconnect certificate expiredsitecore jss unable to verify the first certificate
site certificate not validthis site does not have a certificateupgrade sitecore 9 to 10
certificate issues in windows containers app in aksaks windows containersa certificate issued by the certification authority cannot be installed
windows ce app containerwindows containers in aksaks cert-manager
import certificate in azure app serviceaks x509 certificate signed by unknown authoritywindows containers aks
aks certificate expiredApplying XDT-Based Configuration Transforms in Sitecore Docker Container Based SetupSitecore patch config
Boost Sitecore Search with Advanced Web Crawling and JavaScript ExtractionSitecore Search Advanced Web Crawler with JS Extractor exampleSitecore Containers 🔝