Field Manual: Comparing spot patterns with sharkGrid

The sharkGrid allows connected computers to participate in spot pattern recognition scans run from Wildbook for Whale Sharks. When you start a pattern matching scan for an encounter in Wildbook, you are actually creating a scanTask made up of individual comparisons called scanWorkItems. The scanWorkItems of a scanTask are executed in efficient, parallel groups that are distributed across the grid of computers. The results are reassembled once all comparisons are complete. Each encounter can have up to two active scanTasks: one for a left side pattern and another for a right side pattern.

Starting a spot pattern comparison scan on sharkGrid

Once you have extracted a spot pattern for an encounter, you can look for matches to it across a global set of patterns stored in Wildbok for Whale Sharks. To start looking for matches:

1. Select a left-side or right-side radio button from the Find Pattern Match form in the Action/Edit bar. 2. Click Start Scan.


Figure 1. Staring a scan with the Find Pattern Match form

Starting a pattern matching scan creates a new scanTask in sharkGrid. The scanTask is assigned a unique identifier defined as:

  • “scanL”+encounter number for a left-side scan
  • “scanR”+encounter number for right-side scan

Note: There can be only one left-side scanTask and one right-side scanTask for an encounter, whether they are completed or uncompleted. If you decide to redo a spot pattern and scan for a match again, you must first delete the old scanTask if it still exists.

You can manage your scanTasks in the sharkGrid Administration page by selecting sharkGrid from the Administer menu (login required).

Examining the results of a scanTask

You can check the status of and view the results of scanTasks from the sharkGrid Administration page.

Pending scanTasks

Pending scanTasks are scanTasks that are:

  • being created for submission to the grid
  • being executed on the grid
  • completed but whose results have not yet been written


Figure 2. Two pending scanTasks in sharkGrid waiting for results to be written

You can delete a pending scanTask by clicking the Delete button.

You can examine the results of a pending scanTask (and move it to “Completed” status) by clicking Write Result. The scanTask will subsequently appear in the Completed scanTasks table.

Completed scanTasks

Completed scanTasks are scanTasks that have been successfully completed in sharkGrid and whose results have been written out for review.


Figure 3. Several completed scanTasks in sharkGrid waiting to be removed

You can delete a completed scanTask that you initiated by clicking the Delete button.

You can examine the results of a completed scanTask by clicking View.

Note: Please promptly delete any completed scanTasks after viewing the results. The latest results can always be reviewed from the relevant encounter page.

Administerng sharkGrid

This section describes how sharkGrid operates and how users with various levels of access can change its behavior.

The sharkGrid queue: Adding and deleting scanTasks

The creation and deletion of scanTasks occurs in a single queue. This means that if several users have simultaneously created new scanTasks or attempted to delete existing ones, these changes will occur one by one. A new create/delete scanTask operation will only start after the previous one in the queue has finished. Therefore, you may not see the results of a scanTask delete (i.e. the removal of the scanTask from the sharkGrid Administration page) immediately.

The number of scanTask additions/deletions in the queue for Administrator-level users is visible in this section of the sharkGrid Administration page:


Figure 4. The scanTask queue in sharkGrid

Viewing connected client nodes and their performance

You can view information about connected client nodes doing work in sharkGrid from this section of the sharkGrid Administration page:


Figure 5. gridManager statistics presenting information about sharkGrid nodes and their efficiency

The following information is available for each node:

  • IP - the IP address of the node
  • NodeID - a unique identifier for the node. This identifier is randomly generated.
  • #CPU - The number of CPUs available for processing on the node computer. sharkGrid will take advantage of multicore CPU architectures and run scanWorkItems in parallel on multicore machines.
  • Targeted? - whether the client node in working on a specified scanTask or is generally available for processing all scanTasks
  • # Finished - the number of scanWorkItems (individual pattern comparisons) completed by the node
  • Chunk size - the number of scanWorkItems sent to the node after each request. All nodes start with one scanWorkItem initially and are given larger workloads with subsequent requests. Ultimately, chunk size is determined according to a pre-defined algorithm in Wildbook.

Overall performance in sharkGrid is measured by:

  • % inefficient collisions - this is the percentage of duplicate work performed by nodes in sharkGrid. The ideal is 0. A collision is detected when a node tries to check in a comparison already completed by another node.
  • Total work items completed since startup - This is the number of total comparisons performed in sharkGrid since the last restart of Wildbook.

Configuring sharkGrid behavior

From the sharkGrid Administration page, you can configure the following performance parameters if you have Administrator privileges:

  • Set number of allowed nodes - defines the maximum number of client nodes that can participate in sharkGrid
  • Set node timeout - defines how many milliseconds can pass without a heartbeat from a client node before the node is considered to be no longer connected to sharkGrid
  • Set checkout timeout - defines how long (in milliseconds) after a scanWorkItem is checked out for processing that it can be checked out by another node. The assumption is that, after this timeout period, the node originally checking out the scanWorkItem has left the grid. A very small value for this may cause inefficient duplication of effort. A very large value may cause a slowdown in overall scanTask processing.
  • Set number of allowed scanTasks - this is the maximum number of simultaneous scanTasks allowed in sharkGrid. Once the maximum is reached, existing scanTasks must be deleted before new ones are added. Alternatively, the maximum could be increased.
  • Set maximum chunk/group size sent to nodes - this is the maximum number of comparisons (scanWorkItems) sent to a node after a request. All nodes start at one and advance in group size with each request. Setting this value too high may cause nodes to run out of memory or to take too long processing and register inefficient collisions. A very small value causes nodes to spend more time requesting data than processing it, which is also inefficient.

Video

The following video demonstrates scanning for a match with sharkGrid and analyzing the results.

http://www.whaleshark.org/video/scanningformatches/ScanningForAMatch.htm

Links

 
sharkgrid.txt · Last modified: 2013/10/22 21:33 by webadmin     Back to top