Benchmarks
Basic Benchmarks
Basic benchmarks involve some basic parallel algorithms, including matrix and vector operations such as matrix transposition inversion, vector sum inner product, as well as various parallel reduction and sorting algorithms. We also selected some programs in Polybench, including some slightly more complex linear algebra algorithms such as convolution operations, various forms of matrix multiplication and addition operations, covariance calculations, finite difference calculations.
Program | Type | Discription | Support or Not |
---|---|---|---|
BITONIC_SORT | alphaTest | bitonic sort | |
DOT_PRODUCT | alphaTest | dot product | |
MATRIX_INVERSION | alphaTest | matrix inversion | |
MATRIX_TRANSPOSE | alphaTest | matrix transpose | |
MERGE_SORT | alphaTest | merge sort | |
NBODY | alphaTest | Simulation of N-body problems | |
NQUEEN | alphaTest | N-Queue | |
PREFIX_SUM | alphaTest | Prefix sum | |
RADIX_SORT | alphaTest | Radix sort | |
REDUCTION_MAX | alphaTest | Maxium reduction | |
REDUCTION_SUM | alphaTest | sum reduction | |
TRIANGLE_AREA | alphaTest | Heron's formula to calculate the area of a triangle | |
VECTORADD | alphaTest | Vector addition of multiple data types | |
VECTORDIV | alphaTest | Single-precision floating-point vector division | |
VECTORMMA | alphaTest | Vector multiply add with constant memory | |
2DCONV | Polybench | 2D convolution | |
2MM | Polybench | 2 matrix multiplication(alpha * A * B * C + beta * D) | |
3DCONV | Polybench | 3D convolution | |
3MM | Polybench | 3 matrix multiplication(((AB)(C*D)) | |
ATAX | Polybench | matrix transpose vector multiply | |
BICG | Polybench | matrix transpose vector multiply | |
CORR | Polybench | Correlation coefficient calculation | |
COVAR | Polybench | Covariance calculation | |
FDTD-2D | Polybench | 2D time domain finite difference calculation | |
GEMM | Polybench | General Matrix Multiply Add(C=alpha.A.B+beta.C) | |
GESUMMV | Polybench | scalar-vector-matrix multiplication | |
GRAMSCHM | Polybench | Gram-Schmidt | |
MVT | Polybench | Matrix-Vector Inner Product Transpose | |
SR2K | Polybench | Symmetric Rank Two Update | |
SRK | Polybench | Symmetric Rank Update |
Advanced Benchmarks
Advanced benchmarks cover complex algorithms in many fields such as image processing, high-performance computing, and machine learning, including some of Rodinia's benchmarks involving medical images, physical simulation, etc. They also include some typical Pytorch operators such as Conv2d operator, ReLU operator, MaxPool2d operator, etc,and provide some complete neural network models, such as ResNet18 network, AlexNet network, Yolov3 network, etc.
Program | Type | Discription | Support or Not |
---|---|---|---|
rgb2gray | image | Convert rgb format image to grayscale image | |
Img scale | image | image scaling | |
NOISEREMOVEV1 | image | Image Denoising | |
NOISEREMOVEV2 | image | Image denoising optimized with shared meomory | |
SobelFilter | image | Sobel operator, also known as Sobel-Feldman operator, or Sobel filter, is an image edge detection algorithm widely used in image processing and computer vision. | |
bilateralFilter | image | Bilateral filter is a non-linear filtering method, which is a compromise processing combining the spatial proximity of the image and the similarity of pixel values, while considering the spatial information and gray similarity to achieve the purpose of edge preservation and denoising. It is simple, non-iterative, and local. | |
gaussian | Rodinia/Linear Algebra | Gaussian elimination calculates the result row by row, solving for all variables in the linear system | |
lud | Rodinia/Linear Algebra | LU Decomposition is an algorithm to calculate the solutions of a set of linear equations. The LUD kernel decomposes a matrix as the product of a lower triangular matrix and an upper triangular matrix. | |
heartwall | Rodinia/Medical Imaging | Heart Wall app tracks the movement of the mouse heart on a series of 104,609×590 ultrasound images to record responses to stimuli. In its initial phase, the program performs image processing operations on the first image to detect the initial partial shape of the inner and outer walls of the heart. These operations include: edge detection, SRAD despeckling, morphological transformation and dilation. To reconstruct the approximate full shape of the heart wall, the program generates ellipses that are superimposed on the image and sampled to mark points on the heart wall (Hough Search). In its final stage (core tracking presented here), the program tracks the movement of the surface by detecting the movement of the image region under the sample point as the shape of the core changes throughout the image sequence. | |
particle_filter_naive | Rodinia/Medical Imaging | The particle filter (PF) is statistical estimator of the location of a target object given noisy measurements of that target’s location and an idea of the object’s path in a Bayesian framework. | |
leukocyte | Rodinia/Medical Imaging | The Leukocytes app detects and tracks rolling leukocytes (leukocytes) in in vivo video microscopy of blood vessels. The speed at which white blood cells roll provides important information about the inflammatory process that could help biomedical researchers develop anti-inflammatory drugs. In this application, cells are detected in the first video frame and then tracked through subsequent frames. Detection is done by computing the maximum gradient inverse coefficient of variation (GICOV) score for each pixel in the frame over a range of possible ellipses. The GICOV score of an ellipse is the average gradient magnitude along the ellipse divided by the standard deviation of the gradient magnitude. The GICOV score matrix is then extended to simplify the process of finding local maxima. For each local maximum, an active contour algorithm is used to more accurately determine the shape of the cell. Tracking is done by first computing a motion gradient vector flow (MGVF) matrix in the region around each cell. The MGVF is a gradient field biased along the direction of blood flow, computed using an iterative Jacobian solution procedure. After computing the MGVF, the active contour is used again to refine the shape and determine a new location for each cell. | |
particlefilter_double | Rodinia/Medical Imaging | A particle filter (PF) is a statistical estimation of target object positions given noisy measurements of target positions and a notion of target paths in a Bayesian framework. PF has many applications ranging from video surveillance for tracking vehicles, cells, and faces to video compression. This particular embodiment is optimized for tracking cells, in particular leukocytes and cardiomyocytes. After selecting a target object, PF starts tracking that object by making a series of guesses for the current frame, because of what is already known from the previous frame. PF then uses a predefined likelihood model to determine how likely each of these guesses is to occur. PF then normalizes these guesses based on their likelihood, and then sums the normalized guesses to determine the object's current location. Finally, PF updates the guess based on the object's current location before repeating the process for all remaining frames in the video. | |
hotspot | Rodinia/Physics Simulation | Hotspot is a widely used tool to estimate processor temperature based on an architectural floorplan and simulated power measurements. The thermal simulation iteratively solves a series of differential equations for block. | |
hotspot3d | Rodinia/Physics Simulation | Thermal simulation based on 3D space | |
srad_v1 | Rodinia/Imgae Processing | SRAD (Speckle Reducing Anisotropic Diffusion) is a diffusion method for ultrasonic and radar imaging applications based on partial differential equations (PDEs). Version 1. | |
srad_v2 | Rodinia/Image Processing | SRAD (Speckle Reducing Anisotropic Diffusion) is a diffusion method for ultrasonic and radar imaging applications based on partial differential equations (PDEs).Version 2. | |
nn | Rodinia/Data Mining | NN (Nearest Neighbor) finds the k-nearest neighbors from an unstructured data set. | |
streamcluster | Rodinia/Data Mining | Streaming Cluster (SC), which solves the problem of online clustering. The streamcluster kernel is modified based on the streamcluster benchmark in the Parsec suite developed by Princeton University. Here is a description of stream clustering in the Parsec technical report [1]: "For a stream of input points, it finds a predetermined number of intermediate points in order to assign each point to its closest center. The quality of the clustering is determined by the sum of the squared distances (SSQ) metric.” | |
lavaMD | Rodinia/Molecular Dynamics | The code calculates particle potential and relocation due to mutual forces between particles within a large 3D space. | |
myocyte | Rodinia/Biological Simulation | The Myocyte app simulates cardiomyocytes (cardiomyocytes) and simulates their behavior. The model combines cardiomyocyte electrical activity with the calcineurin pathway, a key aspect of heart failure development. The model spans a large number of timescales to reflect how changes in heart rate observed during exercise or stress promote activation of the calcineurin pathway, which ultimately leads to the expression of many genes that remodel cardiac structure. It can be used to identify potential therapeutic targets that may be useful for heart failure treatment. Biochemical reactions, ion transport, and electrical activity in cells are modeled with 91 ordinary differential equations (ODEs) determined by more than 200 experimentally validated parameters. Simulate the model by solving this set of ODEs over specified time intervals. | |
nw/needle | Rodinia/Bioinformatics | Needleman-Wunsch is a nonlinear global optimization method for DNA sequence alignments. | |
pathfinder | Rodinia/Grid Traversal | PathFinder uses dynamic programming to find a path on a 2-D grid from the bottom row to the top row with the smallest accumulated weights, where each step of the path moves straight ahead or diagonally ahead. | |
histogram | image | 64-bit and 256-bit histograms | |
bucketsort | miscellaneous | The bucket sort algorithm allocates elements to multiple buckets, and inserts and sorts each bucket. | |
RBF | miscellaneous | RBF (Radial Basis Function, Radial Basis Function) network is generally a single hidden layer feed-forward neural network, which uses radial basis function as the hidden layer neuron activation function, and the output layer is the hidden layer A linear combination of the outputs of layer neurons. | |
pcm | miscellaneous | Pulse Code Modulation (PulseCodeModulation), referred to as PCM. It is a digital signal generated by sampling, quantizing and encoding a continuously changing analog signal. The advantage of PCM is that the sound quality is good, but the disadvantage is that it is bulky. PCM can provide users with digital data dedicated line services at a rate from 2M to 155M, and can also provide other services such as voice, image transmission, and distance learning. PCM has two standards (manifestations): E1 and T1. Pulse Code Modulation (Pulse Code Modulation) is the most commonly used and simplest waveform coding. It is a method of directly and simply encoding the digits obtained by sampling and A/D conversion evenly and then encoding them. It is the basis of other encoding algorithms. | |
DCT 8x8 | miscellaneous | The discrete cosine transform (DCT for Discrete Cosine Transform) is a transform related to the Fourier transform, which is similar to the discrete Fourier transform (DFT for Discrete Fourier Transform), but only uses real numbers. The discrete cosine transform is equivalent to a discrete Fourier transform whose length is about twice its length. This discrete Fourier transform is performed on a real even function (because the Fourier transform of a real even function is still a real even function ), in some variants it is necessary to move the position of the input or output by half a unit. | |
BlackSholes | miscellaneous | The Black-Scholes Model (English: Black-Scholes Model), referred to as the BS model, also known as the Black-Scholes-Merton model (Black–Scholes–Merton model), is a financial derivative instrument such as options or warrants. The mathematical model of pricing was first proposed by American economists Myron Scholes and Fischer Black, and modified by Robert C. Merton to distribute dividends It can also be used to make it more perfect. | |
eigenvalues | miscellaneous | Eigenvalue is an important concept in Linear Algebra. It means that if A is an n-order square matrix, if there is a number m and a non-zero n-dimensional column vector x, so that Ax=mx holds, then m is said to be an eigenvalue of A (characteristic value) or eigenvalue (eigenvalue). | |
fastwalshTransform | miscellaneous | Walsh-Hadamard Transform (Walsh-Hadamard Transform) is a generalized Fourier transform, which is a transformation method for spectrum analysis in signal processing, integrated circuits and image processing, and is used to replace the discrete Fourier transform. Fast Walsh-Hadamard Transform is a fast algorithm for WHT, similar to FFT. | |
page rank | miscellaneous | PageRank, webpage ranking, also known as webpage level, Google left ranking or page ranking, is a technology calculated based on the hyperlinks between webpages, and as one of the elements of webpage ranking, it is named after the founder of Google Named after Larry Page. Google uses it to reflect the relevance and importance of web pages, and it is one of the effectiveness factors often used to evaluate web page optimization in search engine optimization operations. Google founders Larry Page and Sergey Brin invented the technology at Stanford University in 1998. PageRank determines the rank of a page through the vast hyperlink relationship of the network. Google interprets the link from page A to page B as page A voting for page B, and Google decides a new rating based on the source of the vote (even the source of the source, that is, the page linked to page A) and the rating of the voting target. Simply put, a high-ranked page can boost the rank of other low-ranked pages. | |
mandelbortset | miscellaneous | Mandelbrot set (Mandelbrot set, or translated as Mandelbrot complex number set) is a collection of points that form a fractal on the complex plane, named after the mathematician Benhua Mandelbrot. Mandelbrot sets have some similarities to Julia sets, such as using the same complex quadratic polynomials for iteration. | |
torch.nn.Conv3d | Pytorch OP | Conv3d operator | |
torch.nn.BatchNorm3d | Pytorch OP | BatchNorm3d operator | |
torch.nn.Conv2d | Pytorch OP | Conv2d operator | |
torch.nn.LeakyReLU | Pytorch OP | LeakyReLU operator | |
torch.nn.ReLU | Pytorch OP | torch.nn.AdaptiveAvgPool2d | |
torch.nn.MaxPool2d | Pytorch OP | MaxPool2d operator | |
torch.nn.AdaptiveAvgPool2d | Pytorch OP | AdaptiveAvgPool2d operator | |
torch.nn.Dropout | Pytorch OP | Dropout operator | |
torch.nn.Linear | Pytorch OP | Linear operator | |
torch.nn.BatchNorm2d | Pytorch OP | BatchNorm2d operator | |
torch.nn.Hardswish | Pytorch OP | Hardsiwsh operator | |
torch.nn.Hardsigmoid | Pytorch OP | Hardsigmoid operator | |
torch.nn.SiLU | Pytorch OP | SiLU operator | |
torch.nn.Sigmoid | Pytorch OP | Sigmoid operator | |
torch.nn.Embedding | Pytorch OP | Embedding operator | |
torch.nn.LayerNorm | Pytorch OP | LayerNorm operator | |
torch.nn.GELU | Pytorch OP | GELU operator | |
torch.nn.Tanh | Pytorch OP | Tanh operator | |
torch.nn.Softmax | Pytorch OP | Softmax operator | |
torch.nn.GRU | Pytorch OP | GRU operator | |
torch.nn.LSTM | Pytorch OP | LSTM operator | |
lenet | Pytorch NN | lenet network | |
AlexNet | Pytorch NN | AlexNet network | |
GoogLeNet | Pytorch NN | GoogLeNet network | |
VGG | Pytorch NN | VGG network | |
ResNet18 | Pytorch NN | ResNet18 network | |
Yolov3 | Pytorch NN | Yolov3 network | |
Yolov5 | Pytorch NN | Yolov5 network | |
Densenet | Pytorch NN | Densenet network | |
squeezenet | Pytorch NN | squeezenet network | |
mobilenetv2 | Pytorch NN | mobilenetv2 network | |
mobilenetv3 | Pytorch NN | mobilenetv3 network | |
inception_v1 | Pytorch NN | inceptionv1 network | |
inception_v2 | Pytorch NN | inceptionv2 network | |
inception_v3 | Pytorch NN | inceptionv3 network | |
ShuffleNetV2 | Pytorch NN | ShuffleNetV2 network | |
EfficientNet | Pytorch NN | EfficientNet network | |
transformer | Pytorch NN | transformer network | |
Lenet train | Pytorch NN | LeNet train |