π§ Artifact-Shield Troubleshooting Guide β
This guide describes the most common issues you'll encounter as a developer or operator of the Artifact-Shield gateway.
π 1. SSL & mTLS Failures β
Scenario: Handshake error when calling a downstream LLM.
The Cause β
- The
.p12file path in the database is incorrect. - The password for the keystore/truststore is invalid.
- The downstream server's certificate is not trusted by our Root CA.
The Fix β
- Check Paths: Ensure the
keystore_pathinshield_downstream_configsis absolute or relative to the working directory. - Verify Trust:bash
keytool -list -v -keystore ./certs/trust.p12 - Debug Handshake: Enable verbose SSL logging:bash
java -Djavax.net.debug=all -jar app.jar
π 2. H2 Database Lock β
Scenario: "Database may be already in use" when starting the application.
The Cause β
- The application was stopped abruptly, and the H2 file lock (
.lock.db) was not released. - Another instance of the app (or a DB client) is accessing the
shielddb.mv.dbfile.
The Fix β
- Check Processes: Kill any java processes still running.
- Delete Lock File: Manually delete
~/shielddb.lock.db.
π« 3. Swagger UI (OpenAPI) 404 β
Scenario: http://localhost:8080/swagger-ui.html returns a 404.
The Cause β
- Springdoc version incompatibility with WebFlux (resolved in
2.5.0+). - The reactive Security Filter is blocking the
/swagger-ui/**path.
The Fix β
- Update Security: Inside
SecurityConfig.java, ensure you have:java.pathMatchers("/swagger-ui/**", "/v3/api-docs/**", "/swagger-ui.html").permitAll() - Clear Cache: Browsers sometimes cache the 404 for
/swagger-ui/index.html. Open in a private window.
π 4. Low Throughput / Event-Loop Blocking β
Scenario: High latency even for small requests.
The Cause β
- Blocking I/O: You or a new contributor added a blocking call (like
Files.readString()) on the main event loop. - Insufficient Memory: The Java Heap is too small for large redaction regexes.
The Fix β
- Check BlockHound: Integrate BlockHound into your tests to detect blocking calls automatically.
- Increase Heap: Use
-Xmx2Gor higher for high-concurrency production loads.
π 5. "401 Unauthorized" from Gemini/OpenAI β
Scenario: Proxied LLM call fails even after redaction.
The Cause β
- The
auth_tokeninshield_downstream_configshas expired. - The LLM provider's token format has changed.
The Fix β
Test Token: Use
curlto test the token directly from the gateway server:bashcurl -H "Authorization: Bearer YOUR_TOKEN" https://api.openai.com/v1/...Update DB: Update the token at runtime using the SQL:
sqlUPDATE shield_downstream_configs SET auth_token = 'new_token' WHERE alias = 'gemini';"CORS Error" in Admin DashboardScenario: The dashboard loads, but statistics and patterns show "Connection Refused" or "CORS Error".
The Cause β
- The gateway has
shield.security.enabled: truebutshield.security.cors-enabledisfalse. - The browser is blocking the request because the dashboard is on a different domain/port.
The Fix β
- Enable CORS: In
application.yml, setshield.security.cors-enabled: true. - Origin Policy: If you are using a reverse proxy (like Nginx), ensure headers are properly propagated:nginx
proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr;
π 7. Downstream LLM Timeouts β
Scenario: Sanitization works, but llmResponse is "Error: Read Timeout".
The Cause β
- The downstream LLM (e.g., GPT-4) is taking too long to generate a response for a large prompt.
- The gateway's
WebClienttimeout is set too low (Default: 30s).
The Fix β
- Increase Timeout: If you are using a custom
DownstreamProxyClient, you can increase theWebClientresponse timeout. - Check Upstream Status: Verify if the LLM provider is currently experiencing an outage.
π 8. Rapid Log File Growth β
Scenario: The Gateway's disk space is filling up with log files.
The Cause β
Logginglevel is set toDEBUGorTRACEin production.- The gateway is processing thousands of requests per minute with verbose auditing.
The Fix β
- Adjust Level: In
application.yml, ensure logging is set toINFOorWARNfor production:yamllogging: level: io.dhoondlay.shield: INFO org.springframework: WARN - Configure Rotation: Use a standard
logback-spring.xmlto enable daily log rotation and compression.
π« 9. H2 Console "403 Forbidden" β
Scenario: Clicking the H2 Console link in the dashboard shows a white screen or a "403" error.
The Cause β
- Spring Security is blocking the
/h2-console/**path. - CSRF protection is enabled and blocking the H2 consoleβs frames.
The Fix β
- Permit Access: In
SecurityConfig.java, ensure/h2-console/**is in thepermitAll()list. - Disable Frame Protection: The H2 console requires frame-same-origin to function. Artifact-Shield handles this automatically in the default
SecurityConfig, so verify your custom rules hasn't overridden it.
π’ 10. Missing Traceability (No Correlation ID) β
Scenario: Splunk is receiving logs, but multiple log lines for the same request are not linked.
The Cause β
- The
CorrelationIdFilteris not being executed (often due to being disabled in a custom filter chain). - MDC (Mapped Diagnostic Context) is lost because a library was used that doesn't support Reactor Context.
The Fix β
- Check Filter Order: Ensure
CorrelationIdFilteris at the beginning of theWebFilterchain. - Reactor Context: Ensure you are using
ReactiveSecurityContextHolderor similar to stay within the reactive pipeline logic.
π 11. My Regex Pattern is not redacting β
Scenario: You added a new rule for PASSWORD but it's not being replaced in the text.
The Cause β
- Case Sensitivity: The regex is sensitive (e.g., matching
passwordbut notPassword). - Lookarounds: You used complex lookarounds that are not supported by the standard Java regex engine in a streaming context.
- Detector disabled: The parent detector category is disabled in
application.yml.
The Fix β
- Test the Pattern: Use a tool like Regex101 (Java mode) with your string.
- Enable Flags: Add
(?i)to the beginning of your regex to make it case-insensitive. - Check Category: Verify
shield.detectors.<category>.enabledistrue.
ποΈ 12. "Table not found" after migrating to Postgres β
Scenario: You switched to PostgreSQL but the app says relation "shield_patterns" does not exist.
The Cause β
- Hibernate
ddl-autois set tononeorvalidate. - The Postgres user doesn't have permission to create tables in the schema.
The Fix β
- Grant Permissions: Ensure the user is a
superuseror hasCREATErights. - Set Auto-DDL: Temporarily set
spring.jpa.hibernate.ddl-auto: updateto let the app build the schema on its first run.
For more help, contact your security engineer or visit the [Artifact-Shield internal wiki].