Intro to utilizing ispc

ispc is Intel’s compiler for generating optimised code across modern CPUs. It supports both Intel and non-Intel architectures.

From https://ispc.github.io/

“ispc is a compiler for a variant of the C programming language, with extensions for “single program, multiple data” (SPMD) programming. Under the SPMD model, the programmer writes a program that generally appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware. (See the ispc documentation for more details and examples that illustrate this concept.)

ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs and the Intel Xeon Phi™ architecture; it frequently provides a 3x or more speedup on CPUs with 4-wide vector SSE units and 5x-6x on CPUs with 8-wide AVX vector units, without any of the difficulty of writing intrinsics code. Parallelization across multiple cores is also supported by ispc, making it possible to write programs that achieve performance improvement that scales by both number of cores and vector unit size.”

To use ispc, like any other language extension, you have to do some setup as it’s not properly supported by Visual Studio yet.

This post provides a useful cheat-sheet for getting set up quickly and to make your life easier by providing some undocumented features and tricks.

As of writing the latest ispc compiler is version 1.12.

Keywords

ispc, C, C++, optimisation, vectorization, object orientation, Intel, AMD, CPU, VisualStudio.

The programming model


One thing you need to understand for ispc is that you can write traditional single-threaded code inside it, and it will run at similar if not the same speed as in C or C++. When using ispc SPMD parallelism there is no cost for switching between parallel and serial operations, on average ispc code will run faster if you can utilize the parallel execution model, otherwise, if you write regular serial C style functions then ispc will run at regular C speeds.

ispc falls into a category between SIMD and thread parallelism, it takes from a bit of both.

ispc uses the concept of programs, which you can think of as threads made from SIMD instructions. This program model is a bit tricky to grasp at first. If you’re an intrinsic wizard you may be thinking “Wait does my machine even have enough lanes to execute this?” and if you make thread schedulers you’d probably wonder “What happens if one thread finishes before the others?”.

Worry not, it’s not that difficult. If you are told to write the same sentence 8 times, then ispc is like handing you a stick with 8 pencils attached. With threads it’s like trying to coordinate 8 hands.

The way ispc handles mapping of programs to lanes is easy: you explicitly give ispc a target on the command line and it’ll generate code that fits it.

Example:

–target-avx2-i32x8

will generate code for 8x parallel programs using a 32-bit execution mask, it will use as many lanes as it needs to guarantee 8x SPMD programs.
As for scheduling, since ispc can launch all SPMD programs at once in a gang, if you are executing some complex logic it will take as long as the longest code-path. This is because all programs execute simultaneously if one program gets in a loop which takes longer than the rest then all the other programs will go underutilized until the longest path has completed execution.

Now you’re up to speed with the coding paradigm lets set up our environment for ispc!

Debugging ispc within Visual Studio


Intel nicely provides an example on how to set up ispc with Visual Studio, generating object code from ispc and linking it into your own C/C++ code; but after that you’re on your own. So to properly integrate ispc using all available tools you’ll need to do a bit more than just linking object code.

One thing all compilers do is define their own internal macros using the compiler flags you passed; example: GNU, Clang, and MSVC all have their own DEBUG macro which gets defined when you use the -g debug flag. However in ispc it’s a bit different, -g in ispc only generates symbols necessary for debugging, which nicely enough Visual Studio can auto-detect these symbols which will let you step through ispc code like you would any other C/C++ code.

Inside ispc -g doesn’t trigger any macro definitions, in fact, no flag will auto-define a macro in ispc. This means you need to edit the property pages of your ispc source files to include the compiler flag -DDEBUG in debug builds, which will create a macro definition for DEBUG inside of ispc. Then you can add asserts and error checking galore and not worry about performance in release and production builds.

Syntax highlighting


At this point you can write ispc code with some nice tooling to help make your coding life easier, however, you’ll still be working with a wall of single color text, which can make finding a problem harder.
As of today, the ispc language is 8 years old, however the only syntax highlighting plugins that exist only work with Visual Studio Code, the little brother to Visual Studio; there still isn’t syntax highlighting in Visual Studio.

Here’s a trick to make your life easier:

  1. Open the Visual Studio context menus: Tools->Options->Text Editor->File Extension
  2. Then set the extension ispc to Microsoft Visual C++
  3. Close and reopen your ispc files and voilà

Visual Studio will now assign the C++ syntax highlighting to ispc, which is accurate most of the time since ispc is syntactically similar to C with some C++ style sugar on top.

Macro tips, tricks, and issues


As the ispc compiler is derived from LLVM we have access to macros which aren’t documented in ispc’s user manual. The most important of which is __COUNTER__, which increments by one every time it is encountered, which is incredibly useful for generating unique names for types and functions programmatically.

Here is just a small list of useful macros at your disposal within ispc:

__COUNTER__
__VA_ARGS__ 
__LINE__ 
__FILE__ 
__BYTE_ORDER__ 
__ORDER_LITTLE_ENDIAN__ 
__ORDER_BIG_ENDIAN__ 

Note: Because __FILE__ produces a string literal it causes the compiler to die if used as an argument, the way around this is to rely on expansion inside of the print function:

print("We're inside "__FILE__ " at line % \n", __LINE__);

We’ve seen the good, now comes the ugly: never mix macros with foreach() loops!

The regular for() and cfor() loops are fine, but if you write something like:

#define MY_CONSTANT 10
void func() { 
   foreach(i = 0...MY_CONSTANT) { 
         // ... 
    } 
} 

This will fail to compile. In this case MY_CONSTANT is undefined and has no value in the scope of the foreach() loop.
However by changing the definition to:

const int MY_CONSTANT = 10;
void func() { 
    foreach(i = 0...MY_CONSTANT) { 
       // ... 
    } 
} 

This compiles successfully. Eventually, this can get resolved in later releases of ispc, but for now, it’s something to be aware of.

Objects and Exports


Let’s say that you want to fully utilize the insane optimization algorithms inside of ispc on both your code logic and data-structures. This way we can heavily utilize AOS<->SOA handling built into ispc, as well as the gang execution model paired with the very well optimized thread handler ‘tasksys.cpp’.
Continue reading, because we’re going to dive into how we can use ispc for writing (almost) anything.

Let’s do some object-oriented programming in ispc!

Because ispc is like old fashioned C we need to approach writing classes in the form of structs within structs.
In ispc you write an object in the form of:

struct MyObject {
  uniform int x; 
  int y; 
}; 
void ObjPrint(uniform MyObject* uniform obj)  { 
    print ("x=%,y=%\n", obj->x, obj->y); 
} 
 export Test() {
    MyObject obj;
    ObjPrint(&obj); 
} 

This is perfectly safe and recommended if you have objects you need to move around.

Being able to make objects within ispc is great for building large complex systems, but what if we want our ispc code to perform string manipulation or interface with a C/C++ library?

We can’t just rewrite a C library in ispc just to make it work, so instead lets use export and extern to inter-operate between ispc/C/C++ seamlessly!

Exporting is often used for just functions, that’s all the ispc manual says you can export; but did you know you can export structs to C/C++? Even structs containing varyings! An easy way to think about externs and exports is that extern functions and objects are defined in C/C++ code, while exports are functions and objects which are defined within ispc.

Example:

ispc

/* The struct and enum can be shared between ispc and C/C++ */
 export enum ErrorCodes { 
     NO_ERROR, 
     ALLOC_ERROR, 
     STAGING_ERROR  
}; 
 export struct Histogram {
    uniform int8 * uniform m_Data;
    uniform size_t m_DataSize; 
 
    size_t m_Histo[256];
    varying size_t m_BytesReadPerProgram;
    uniform int m_ProgramCount;
}; 
export uniform ErrorCodes Create (uniform Histogram* uniform obj, uniform size_t data_size) { 
   //Allocate and create buffer data
   obj->m_DataSize = data_size;
   obj->m_Data = uniform new int8[data_size];
    if (obj->m_Data == NULL) {         
     return ALLOC_ERROR;      
    } 
    for (uniform size_t i = 0, b = 876543211; i < data_size; i++, b = ((b + 1) * 3) + (b >> 7)) {         
       obj->m_Data[i] = (b + i) >> 13;
    } 
    // Set stat variables
    obj->m_ProgramCount = programCount;
 
    obj->m_BytesReadPerProgram = 0; 
    foreach(i = 0...256) { 
        obj->m_Histo[i] = 0; 
    }
    return NO_ERROR;
} 
 export uniform ErrorCodes Destroy(uniform Histogram* uniform obj) {
    if (obj->m_Data != NULL) {
        delete[] obj->m_Data;
    }
    else {
        return STAGING_ERROR;
    }
    return NO_ERROR; 
}
 
export void Process(uniform Histogram* uniform obj) {
    uniform unsigned int8* uniform data = (uniform unsigned int8 * uniform)obj->m_Data;
    foreach(p = 0...obj->m_DataSize)
    {
        atomic_add_local(&obj->m_Histo[data[p]], 1);
        obj->m_BytesReadPerProgram++;
    }
} 

C++

#include <stdio.h>
#include <string.h>
#include "my_ispc_header.h"
 
int main(int argc, char** argv) {
   using namespace ispc;
   Histogram obj;
    
   ErrorCodes err = Create(&obj, 654321); 
   if (err != NO_ERROR) 
        return err;
    
   Process(&obj); 
   for (int i = 0; i < 256; i++)
        printf(" Symbol 0x%02x: Frequency %llu \n", i, obj.m_Histo[i]);  
   for (int i = 0; i < obj.m_ProgramCount; i++) 
        printf(" Program #%i, bytes read: %llu \n", i, obj.m_BytesReadPerProgram[i]); 
 
   err = Destroy(&obj);
   if (err != NO_ERROR) 
      return err; 
   return NO_ERROR;
}  

In this example it demonstrates how ispc can be integrated into C/C++ projects almost seamlessly in a C-style fashion; from error handling to object-oriented programming.

ispc may be a different language but the intermediate format is object code which means it inter-operates between C and C++. We aren’t just limited to writing ispc kernels for speeding up a few functions, we can use ispc to write full libraries for data processing.

Future-proofing


ispc is an ideal language for heavy workloads since it is built to utilize every corner of a CPU’s architecture, from SIMD as a form of program execution, to the integrated thread parallelism support via launch and sync keywords. Given the trend towards significantly higher core counts in recent processors, ispc is a no-brainer when it comes to utilizing all the power at your disposal. In the future I see ispc gaining popularity and a must-learn language for anyone working on optimizing.

That’s all folks!