What does it mean when AMPS has detected that it may not be running correctly?

The AMPS server is highly concurrent, which means that the server itself has many threads doing work at the same time.

AMPS includes a thread monitor that keeps track of how fast AMPS threads are processing the work provided. If a thread does not report progress when expected, AMPS logs message 30-0000 to indicate that the thread has failed to report when expected. This means that it is possible that the thread may be stuck -- unable to make progress -- or an operation may be taking longer than expected. For example, an operation involving a large file may take an unexpectedly long time on a busy SAN and cause AMPS to report that the thread performing the operation is potentially stuck. Likewise, if an AMPS instance is located on a server with other processes that require large amounts of CPU, network, or storage capacity, AMPS may report a potential stuck thread when AMPS is unable to acquire enough resources to make progress in a timely manner.

If a thread continues to fail to report progress, AMPS begins to generate diagnostic minidumps to capture the current position in the AMPS code for all of the threads in the instance. Since troubleshooting a slow thread is easiest to do with more than one snapshot of the where code execution is for each thread, AMPS begins to produce diagnostic minidumps at the point at which the server detects that there may be a problem emerging but before the problem has reached a critical level.

AMPS logs error 01-0022 when AMPS generates a diagnostic minidump. AMPS logs this error at critical level since, at the time the error is logged, the thread has not reported for long enough that AMPS will shut down if the problem persists. AMPS is often able to recover from this particular error, but this typically indicates a capacity problem with the system hosting AMPS or a problem in the AMPS server.

In many cases, the server recovers from a temporarily slow thread without incident. In those cases, the AMPS server logs indicating which thread was slow may provide the most useful information on the slowdown. In cases where the server shuts down due to a thread becoming stuck, both the logs and minidump files should be provided to support.

Last updated