What are some ways to consume large SOW queries?

What are some ways to consume large SOW queries?

AMPS has sophisticated and configurable slow consumer mitigation functionality that protects AMPS and other consumers from performance degradation due to a single over-subscribed consumer. The defaults for the slow consumer configuration may not be enough to handle large SOW queries where the consumer will have large bursts of over-subscription followed by long quiet periods of processing.

Imagine a consumer wants to execute a SOW query that returns 3 million records that average in size 1024 bytes. This would require greater than 3GB of storage while the consumer pulled all of that data over the network and by default would result in an almost immediate slow consumer disconnect event.

Side Note: A SOW Query of 3 million records with an average record size of 1024 bytes takes a bit over 3GB of data to send to the client. It will take your client about 30 seconds to pull that across a 1Gbps network link (or 3 seconds over a 10Gbps link), and that estimate assumes that your message handling code can operate at the line rate.

Pagination

In current versions of AMPS, support for pagination makes it possible to retrieve a specific part of a larger result set. The command uses the skip_n and top_n options to specify which part of the result set to return. This is particularly useful for applications that use only a part of the overall data at a given time. For example, if the SOW contains 3 million rows, but the application only displays 500 rows at a time, the application may never need to request the full set of records from the SOW. (Notice, however, that these options apply strictly to the SOW query, and not to subsequent updates. If an application needs updates to the query, the methods below can be more useful.)

Slow Consumer Mitigation

In current versions of AMPS, slow consumer mitigation contains settings for both individual clients and for resource consumption across the instance as a whole. For most applications, these changes make slow client disconnection much less frequent.

MessageMemoryLimit Sets the maximum amount of memory to use for messages (for this Transport or the instance as a whole). (units = bytes, default = 10% of total host memory or 10% of the amount of memory AMPS is allowed to consume, whichever is lowest)

MessageDiskLimit Sets the maximum amount of disk space to use for messages (for either this Transport or for the instance as a whole). (units = bytes, default = 1GB or the size of the MessageMemoryLimit, whichever is highest)

ClientMaxCapacity Sets the amount of the available limit capacity a single client can consume. (units = bytes, default = 100%)

ClientMessageAgeLimit Sets the maximum age of the oldest message held for the client. If the oldest message AMPS has buffered for this client is older than this limit, AMPS disconnects the client. (units = time interval, default = unlimited)

Recommended sizing:

MessageMemoryLimit 60East recommends leaving this configuration parameter at the default where possible. If more space is required, increase the parameter by 1-2% at a time. Use caution with settings greater than 20%.

MessageDiskLimit To estimate the message disk limit, start with MaxResultSize * (1.0 + 150/AverageRecordSize) * NumberOfSimultaneousClients , or the MessageMemoryLimit, whichever is greater. If clients are still offlined, 60East recommends growing this limit in increments of MaxResultSize * (1.0 + 150/AverageRecordSize)

Adjusting these parameters can help prevent AMPS from disconnecting queries that produce a large number of results, but you can additionally use the BatchSize parameter on your SOW Query to increase the efficiency of the SOW results being returned to you (less metadata overhead). There is a compromise with the batch size: a larger batch size means fewer batches, with less metadata overhead. However, each batch will be larger and must be held in memory until the batch is full (or the query is complete). The recommended setting for batch size is 10, which is a good default (and the default used in current versions of the 60East Client APIs.)

Last updated