Benchmarks

Basic Benchmarks

Basic benchmarks involve some basic parallel algorithms, including matrix and vector operations such as matrix transposition inversion, vector sum inner product, as well as various parallel reduction and sorting algorithms. We also selected some programs in Polybench, including some slightly more complex linear algebra algorithms such as convolution operations, various forms of matrix multiplication and addition operations, covariance calculations, finite difference calculations.

Program	Type	Discription
BITONIC_SORT	alphaTest	bitonic sort
DOT_PRODUCT	alphaTest	dot product
MATRIX_INVERSION	alphaTest	matrix inversion
MATRIX_TRANSPOSE	alphaTest	matrix transpose
MERGE_SORT	alphaTest	merge sort
NBODY	alphaTest	Simulation of N-body problems
NQUEEN	alphaTest	N-Queue
PREFIX_SUM	alphaTest	Prefix sum
RADIX_SORT	alphaTest	Radix sort
REDUCTION_MAX	alphaTest	Maxium reduction
REDUCTION_SUM	alphaTest	sum reduction
TRIANGLE_AREA	alphaTest	Heron's formula to calculate the area of a triangle
VECTORADD	alphaTest	Vector addition of multiple data types
VECTORDIV	alphaTest	Single-precision floating-point vector division
VECTORMMA	alphaTest	Vector multiply add with constant memory
2DCONV	Polybench	2D convolution
2MM	Polybench	2 matrix multiplication(alpha * A * B * C + beta * D)
3DCONV	Polybench	3D convolution
3MM	Polybench	3 matrix multiplication(((AB)(C*D))
ATAX	Polybench	matrix transpose vector multiply
BICG	Polybench	matrix transpose vector multiply
CORR	Polybench	Correlation coefficient calculation
COVAR	Polybench	Covariance calculation
FDTD-2D	Polybench	2D time domain finite difference calculation
GEMM	Polybench	General Matrix Multiply Add(C=alpha.A.B+beta.C)
GESUMMV	Polybench	scalar-vector-matrix multiplication
GRAMSCHM	Polybench	Gram-Schmidt
MVT	Polybench	Matrix-Vector Inner Product Transpose
SR2K	Polybench	Symmetric Rank Two Update
SRK	Polybench	Symmetric Rank Update

Advanced Benchmarks

Advanced benchmarks cover complex algorithms in many fields such as image processing, high-performance computing, and machine learning, including some of Rodinia's benchmarks involving medical images, physical simulation, etc. They also include some typical Pytorch operators such as Conv2d operator, ReLU operator, MaxPool2d operator, etc，and provide some complete neural network models, such as ResNet18 network, AlexNet network, Yolov3 network, etc.

Program	Type	Discription
rgb2gray	image	Convert rgb format image to grayscale image
Img scale	image	image scaling
NOISEREMOVEV1	image	Image Denoising
NOISEREMOVEV2	image	Image denoising optimized with shared meomory
SobelFilter	image	Sobel operator, also known as Sobel-Feldman operator, or Sobel filter, is an image edge detection algorithm widely used in image processing and computer vision.
bilateralFilter	image	Bilateral filter is a non-linear filtering method, which is a compromise processing combining the spatial proximity of the image and the similarity of pixel values, while considering the spatial information and gray similarity to achieve the purpose of edge preservation and denoising. It is simple, non-iterative, and local.
gaussian	Rodinia/Linear Algebra	Gaussian elimination calculates the result row by row, solving for all variables in the linear system
lud	Rodinia/Linear Algebra	LU Decomposition is an algorithm to calculate the solutions of a set of linear equations. The LUD kernel decomposes a matrix as the product of a lower triangular matrix and an upper triangular matrix.
heartwall	Rodinia/Medical Imaging	Heart Wall app tracks the movement of the mouse heart on a series of 104,609×590 ultrasound images to record responses to stimuli. In its initial phase, the program performs image processing operations on the first image to detect the initial partial shape of the inner and outer walls of the heart. These operations include: edge detection, SRAD despeckling, morphological transformation and dilation. To reconstruct the approximate full shape of the heart wall, the program generates ellipses that are superimposed on the image and sampled to mark points on the heart wall (Hough Search). In its final stage (core tracking presented here), the program tracks the movement of the surface by detecting the movement of the image region under the sample point as the shape of the core changes throughout the image sequence.
particle_filter_naive	Rodinia/Medical Imaging	The particle filter (PF) is statistical estimator of the location of a target object given noisy measurements of that target’s location and an idea of the object’s path in a Bayesian framework.
leukocyte	Rodinia/Medical Imaging	The Leukocytes app detects and tracks rolling leukocytes (leukocytes) in in vivo video microscopy of blood vessels. The speed at which white blood cells roll provides important information about the inflammatory process that could help biomedical researchers develop anti-inflammatory drugs. In this application, cells are detected in the first video frame and then tracked through subsequent frames. Detection is done by computing the maximum gradient inverse coefficient of variation (GICOV) score for each pixel in the frame over a range of possible ellipses. The GICOV score of an ellipse is the average gradient magnitude along the ellipse divided by the standard deviation of the gradient magnitude. The GICOV score matrix is then extended to simplify the process of finding local maxima. For each local maximum, an active contour algorithm is used to more accurately determine the shape of the cell. Tracking is done by first computing a motion gradient vector flow (MGVF) matrix in the region around each cell. The MGVF is a gradient field biased along the direction of blood flow, computed using an iterative Jacobian solution procedure. After computing the MGVF, the active contour is used again to refine the shape and determine a new location for each cell.
particlefilter_double	Rodinia/Medical Imaging	A particle filter (PF) is a statistical estimation of target object positions given noisy measurements of target positions and a notion of target paths in a Bayesian framework. PF has many applications ranging from video surveillance for tracking vehicles, cells, and faces to video compression. This particular embodiment is optimized for tracking cells, in particular leukocytes and cardiomyocytes. After selecting a target object, PF starts tracking that object by making a series of guesses for the current frame, because of what is already known from the previous frame. PF then uses a predefined likelihood model to determine how likely each of these guesses is to occur. PF then normalizes these guesses based on their likelihood, and then sums the normalized guesses to determine the object's current location. Finally, PF updates the guess based on the object's current location before repeating the process for all remaining frames in the video.
hotspot	Rodinia/Physics Simulation	Hotspot is a widely used tool to estimate processor temperature based on an architectural floorplan and simulated power measurements. The thermal simulation iteratively solves a series of differential equations for block.
hotspot3d	Rodinia/Physics Simulation	Thermal simulation based on 3D space
srad_v1	Rodinia/Imgae Processing	SRAD (Speckle Reducing Anisotropic Diffusion) is a diffusion method for ultrasonic and radar imaging applications based on partial differential equations (PDEs). Version 1.
srad_v2	Rodinia/Image Processing	SRAD (Speckle Reducing Anisotropic Diffusion) is a diffusion method for ultrasonic and radar imaging applications based on partial differential equations (PDEs).Version 2.
nn	Rodinia/Data Mining	NN (Nearest Neighbor) finds the k-nearest neighbors from an unstructured data set.
streamcluster	Rodinia/Data Mining	Streaming Cluster (SC), which solves the problem of online clustering. The streamcluster kernel is modified based on the streamcluster benchmark in the Parsec suite developed by Princeton University. Here is a description of stream clustering in the Parsec technical report [1]: "For a stream of input points, it finds a predetermined number of intermediate points in order to assign each point to its closest center. The quality of the clustering is determined by the sum of the squared distances (SSQ) metric.”
lavaMD	Rodinia/Molecular Dynamics	The code calculates particle potential and relocation due to mutual forces between particles within a large 3D space.
myocyte	Rodinia/Biological Simulation	The Myocyte app simulates cardiomyocytes (cardiomyocytes) and simulates their behavior. The model combines cardiomyocyte electrical activity with the calcineurin pathway, a key aspect of heart failure development. The model spans a large number of timescales to reflect how changes in heart rate observed during exercise or stress promote activation of the calcineurin pathway, which ultimately leads to the expression of many genes that remodel cardiac structure. It can be used to identify potential therapeutic targets that may be useful for heart failure treatment. Biochemical reactions, ion transport, and electrical activity in cells are modeled with 91 ordinary differential equations (ODEs) determined by more than 200 experimentally validated parameters. Simulate the model by solving this set of ODEs over specified time intervals.
nw/needle	Rodinia/Bioinformatics	Needleman-Wunsch is a nonlinear global optimization method for DNA sequence alignments.
pathfinder	Rodinia/Grid Traversal	PathFinder uses dynamic programming to find a path on a 2-D grid from the bottom row to the top row with the smallest accumulated weights, where each step of the path moves straight ahead or diagonally ahead.
histogram	image	64-bit and 256-bit histograms
bucketsort	miscellaneous	The bucket sort algorithm allocates elements to multiple buckets, and inserts and sorts each bucket.
RBF	miscellaneous	RBF (Radial Basis Function, Radial Basis Function) network is generally a single hidden layer feed-forward neural network, which uses radial basis function as the hidden layer neuron activation function, and the output layer is the hidden layer A linear combination of the outputs of layer neurons.
pcm	miscellaneous	Pulse Code Modulation (PulseCodeModulation), referred to as PCM. It is a digital signal generated by sampling, quantizing and encoding a continuously changing analog signal. The advantage of PCM is that the sound quality is good, but the disadvantage is that it is bulky. PCM can provide users with digital data dedicated line services at a rate from 2M to 155M, and can also provide other services such as voice, image transmission, and distance learning. PCM has two standards (manifestations): E1 and T1. Pulse Code Modulation (Pulse Code Modulation) is the most commonly used and simplest waveform coding. It is a method of directly and simply encoding the digits obtained by sampling and A/D conversion evenly and then encoding them. It is the basis of other encoding algorithms.
DCT 8x8	miscellaneous	The discrete cosine transform (DCT for Discrete Cosine Transform) is a transform related to the Fourier transform, which is similar to the discrete Fourier transform (DFT for Discrete Fourier Transform), but only uses real numbers. The discrete cosine transform is equivalent to a discrete Fourier transform whose length is about twice its length. This discrete Fourier transform is performed on a real even function (because the Fourier transform of a real even function is still a real even function ), in some variants it is necessary to move the position of the input or output by half a unit.
BlackSholes	miscellaneous	The Black-Scholes Model (English: Black-Scholes Model), referred to as the BS model, also known as the Black-Scholes-Merton model (Black–Scholes–Merton model), is a financial derivative instrument such as options or warrants. The mathematical model of pricing was first proposed by American economists Myron Scholes and Fischer Black, and modified by Robert C. Merton to distribute dividends It can also be used to make it more perfect.
eigenvalues	miscellaneous	Eigenvalue is an important concept in Linear Algebra. It means that if A is an n-order square matrix, if there is a number m and a non-zero n-dimensional column vector x, so that Ax=mx holds, then m is said to be an eigenvalue of A (characteristic value) or eigenvalue (eigenvalue).
fastwalshTransform	miscellaneous	Walsh-Hadamard Transform (Walsh-Hadamard Transform) is a generalized Fourier transform, which is a transformation method for spectrum analysis in signal processing, integrated circuits and image processing, and is used to replace the discrete Fourier transform. Fast Walsh-Hadamard Transform is a fast algorithm for WHT, similar to FFT.
page rank	miscellaneous	PageRank, webpage ranking, also known as webpage level, Google left ranking or page ranking, is a technology calculated based on the hyperlinks between webpages, and as one of the elements of webpage ranking, it is named after the founder of Google Named after Larry Page. Google uses it to reflect the relevance and importance of web pages, and it is one of the effectiveness factors often used to evaluate web page optimization in search engine optimization operations. Google founders Larry Page and Sergey Brin invented the technology at Stanford University in 1998. PageRank determines the rank of a page through the vast hyperlink relationship of the network. Google interprets the link from page A to page B as page A voting for page B, and Google decides a new rating based on the source of the vote (even the source of the source, that is, the page linked to page A) and the rating of the voting target. Simply put, a high-ranked page can boost the rank of other low-ranked pages.
mandelbortset	miscellaneous	Mandelbrot set (Mandelbrot set, or translated as Mandelbrot complex number set) is a collection of points that form a fractal on the complex plane, named after the mathematician Benhua Mandelbrot. Mandelbrot sets have some similarities to Julia sets, such as using the same complex quadratic polynomials for iteration.
torch.nn.Conv3d	Pytorch OP	Conv3d operator
torch.nn.BatchNorm3d	Pytorch OP	BatchNorm3d operator
torch.nn.Conv2d	Pytorch OP	Conv2d operator
torch.nn.LeakyReLU	Pytorch OP	LeakyReLU operator
torch.nn.ReLU	Pytorch OP	torch.nn.AdaptiveAvgPool2d
torch.nn.MaxPool2d	Pytorch OP	MaxPool2d operator
torch.nn.AdaptiveAvgPool2d	Pytorch OP	AdaptiveAvgPool2d operator
torch.nn.Dropout	Pytorch OP	Dropout operator
torch.nn.Linear	Pytorch OP	Linear operator
torch.nn.BatchNorm2d	Pytorch OP	BatchNorm2d operator
torch.nn.Hardswish	Pytorch OP	Hardsiwsh operator
torch.nn.Hardsigmoid	Pytorch OP	Hardsigmoid operator
torch.nn.SiLU	Pytorch OP	SiLU operator
torch.nn.Sigmoid	Pytorch OP	Sigmoid operator
torch.nn.Embedding	Pytorch OP	Embedding operator
torch.nn.LayerNorm	Pytorch OP	LayerNorm operator
torch.nn.GELU	Pytorch OP	GELU operator
torch.nn.Tanh	Pytorch OP	Tanh operator
torch.nn.Softmax	Pytorch OP	Softmax operator
torch.nn.GRU	Pytorch OP	GRU operator
torch.nn.LSTM	Pytorch OP	LSTM operator
lenet	Pytorch NN	lenet network
AlexNet	Pytorch NN	AlexNet network
GoogLeNet	Pytorch NN	GoogLeNet network
VGG	Pytorch NN	VGG network
ResNet18	Pytorch NN	ResNet18 network
Yolov3	Pytorch NN	Yolov3 network
Yolov5	Pytorch NN	Yolov5 network
Densenet	Pytorch NN	Densenet network
squeezenet	Pytorch NN	squeezenet network
mobilenetv2	Pytorch NN	mobilenetv2 network
mobilenetv3	Pytorch NN	mobilenetv3 network
inception_v1	Pytorch NN	inceptionv1 network
inception_v2	Pytorch NN	inceptionv2 network
inception_v3	Pytorch NN	inceptionv3 network
ShuffleNetV2	Pytorch NN	ShuffleNetV2 network
EfficientNet	Pytorch NN	EfficientNet network
transformer	Pytorch NN	transformer network
Lenet train	Pytorch NN	LeNet train