This is the html version of the file https://www.content.shi.com/SHIcom/ContentAttachmentImages/SharedResources/PDFs/IntelSoftware/PUM-31-Abstracts.pdf. Google automatically generates html versions of documents as we crawl the web.
Tip: To quickly find your search term on this page, press Ctrl+F or ⌘-F (Mac) and use the find bar.
FPGA programming with OpenCL™ Accelerating Eigen Math Library for Automated Driving Workloads
Page 1
Parallel Universe Issue #31 Abstracts
Feature:
FPGA programming with OpenCL™
Knowing How to Program an FPGA is a Skill you Need―and Here’s How to Start
Field programmable gate arrays (FPGAs) are exciting because they offer high performance, with low
latency and power efficiency. These benefits are realized through their massive parallel capabilities
coupled with reconfigurability. An FPGA provides a reconfigurable sea of gates on which anyone can:
• Design their own custom hardware accelerator
• Deploy it for a single application
• Quickly reconfigure the device as a new accelerator for a different application
This article explains how you can make your own designs and use them to accelerate your applications.
Hardware engineers have long used FPGAs in place of ASICs in a variety of applications. Historically,
configuring them (programming) has been done with high-level definition languages like Verilog* or
VHDL*. These design methodologies are familiar and useful for hardware engineers but completely
foreign to software developers.
New tools centered around OpenCL* bridge this gap to bring the benefits of FPGA hardware platforms
to software developers. Using these new tools, we can view FPGAs as highly configurable devices that
can transform custom algorithms, written in a C-like syntax, into fast and power-efficient hardware. The
flexibility and performance of these solutions are vast thanks to the latest generation of FPGA devices
that offer up to 10 TFLOPs of performance.
Accelerating Eigen Math Library for Automated Driving Workloads
Meeting the Need for Speed with Intel® Math Kernel Library
Automated driving workloads include several matrix operations at their core. Sensor fusion and
localization algorithms―such as different versions of the Kalman filter―are critical components
in the automated driving software pipeline. The Intel® Math Kernel Library (Intel® MKL) is a
powerhouse of tuned subprograms for numerous math operations, including a fast DGEMM. The
automated driving developer community typically uses Eigen* a C++ math library, for matrix
operations. In addition to Intel MKL, LIBXSMM*, a highly-tuned library for high-performance matrix-
matrix multiplications, shows potential to speed up matrix operations. In this article, we investigate and
improve the performance of native Eigen on matrix multiplication benchmarks and the extended Kalman
filter (EKF) by using Intel MKL and LIBXSMM with GNU* and Intel® compilers on the Intel® Xeon®
processor.

Page 2
Speeding Algebra Computations with Intel® Math Kernel Library Vectorized
Compact Matrix Functions
Maximizing the Performance Benefits of the Compact Data Layout
High-performance computing (HPC) and machine learning applications often require performing
computations on large groups of very small matrices. The performance of these operations scales well
with increased core counts using OpenMP* loops or batched linear algebra functions―but vector
utilization in these methods is limited. BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear
Algebra Package) functions traditionally operate on data stored in a standard layout, with matrices in
row (or column) major order. Storing data in non-standard layouts that allow for cross-matrix
vectorization can provide a significant speedup in BLAS and LAPACK functions for small-sized matrices.
In this article, we describe a new data layout, called Compact, which interleaves matrices in blocks
according to the architecture’s SIMD (single instruction, multiple data) vector length. We demonstrate
its benefits for linear algebra computations such as matrix multiplication and LU factorization―
observing up to 60X and 20X speedups, respectively, for double precision over optimized kernels written
for standard data layouts. Intel® Math Kernel Library (Intel® MKL) 2018 includes new Compact functions
that support and maximize the performance benefits of the compact data layout.
Boosting Java* Performance in Big Data Applications
How New Enhancements Enable Faster and Better Numerical Computing
Since its introduction more than two decades ago, Java* has become a popular enterprise language.
Embraced initially for its write-once, run-anywhere property, its usage increased due to the relative ease
of building new applications by integrating existing Java projects. The growth of several open-source
Java application stacks such as Apache Hadoop* has added to this growth. Fundamentally, developers
view Java as enhancing their productivity.
New enhancements to Java enable faster and better numerical computing. The JVM has been improved
to use the Intel AVX FMA instructions in the implementation of the Math.fma(). This results in
significant performance improvements of matrix multiplications, the basic workhorse of HPC and AI
applications.
Gaining Performance Insights Using the Intel® Advisor Python API
Getting Good Data to Make Code Tuning Decisions
Good design decisions are based on good data. Intel® Advisor is a dynamic analysis tool that’s part of
Intel® Parallel Studio XE, Intel’s comprehensive tool suite for building and modernizing code. Intel
Advisor answers all essential performance questions and lets you collect insightful program metrics on
the vectorization and memory profile of your application. And, besides providing tailored reports using
the GUI and command line, Intel Advisor now gives you the added flexibility to mine a collected
database and create powerful new reports using Python*.

Page 3
When you run Intel Advisor, it stores all the data it collects in a proprietary database, which you can
access using the new Python API, which provides a flexible way to report on useful program metrics. This
article explains how to use this new functionality.
Welcome to the Intel® AI Academy
AI Education for All
With the rise of artificial intelligence (AI), it’s no surprise that Intel has a clear vision and approach to
engaging with those at the cutting edge of innovation―developers and academics. Whether you’re just
starting out or already an expert, the Intel® AI Academy provides learning materials, tools, and
technology to help shape, create, build, and develop the future of AI.