SSE 3, SSE 4 and AVX in one application

Discussion in 'Mac Programming' started by silvercircle, Apr 6, 2014.

  1. silvercircle macrumors member

    Nov 18, 2010
    How do I support SSE 3, SSE4 and AVX in one application/bundle?

    Do I check at launch what options (processor) are supported and then run a specific application from within the bundle? Are there other options to accomplish this? And how can I check which option is supported?

    If I select SSE 4.2 on my mid 10 Mac Por the program runs a lot faster then when I select SSE 3, I want to offer the best and fastest for every user.
  2. gnasher729 macrumors P6


    Nov 25, 2005
    The official way to check what is supported is by calling sysctl. I haven't used code that checks for the CPU type, but as an example:

    		// Get the number of processors, cores, and threads by calling sysctl. If a call to 
    		// sysctlbyname fails, then assume there is one processor, one core per processor, and one
    		// thread per core. 
    		size_t len;
    		unsigned int procCount;
    		unsigned int coreCount;
    		unsigned int threadCount;
    		if (sysctlbyname ("hw.packages", &procCount, (len = sizeof (procCount), &len), NULL, 0) != 0)
    			procCount = 1;
    		if (sysctlbyname ("hw.physicalcpu", &coreCount, (len = sizeof (coreCount), &len), NULL, 0) != 0)
    			coreCount = procCount;
    		if (sysctlbyname ("hw.logicalcpu", &threadCount, (len = sizeof (threadCount), &len), NULL, 0) != 0)
    			threadCount = coreCount;
    I'd probably put the performance critical code into a class (C++ or Objective-C) with subclasses that are compiled with different compiler options, as far as possible compiling identical code, and have some factory method returning an instance of the right class, depending on the processor that you have.
  3. MorphingDragon, Apr 6, 2014
    Last edited: Apr 6, 2014

    MorphingDragon macrumors 603


    Mar 27, 2009
    The World Inbetween
    If you're doing SIMD via intrinsics or assembly the way you usually do it is to have multiple code paths for the program kernels that require SIMD. Then at runtime choose the codepath you need. More advanced applications use runtime code generation. As Gnasher mentioned usually this is an an application layer class to abstract away the details.

    Code is untested, consider it c style pseudocode.
    void Kernel_SSE3(args) {
       // SSE3 code
    void Kernel_SSE4(args) {
       // SSE4 code
    void Kernel_AVX(args) {
       // AVX Code
    void Kernel_FMA(args) {
      // FMA code
    void (*functionPtr)(arg,arg...)  g_KernelFunction = nullptr;
    int main(...) {
        int simdType = GetSIMDType(ReadProc());
             case SSE3:
                    g_KernelFunction = Kernel_SSE3;
        etc etc
    If you're letting the compiler do SIMD code generation. A) Most compilers don't let you have that much granularity, not easily. B) Don't rely on the compiler to output SIMD code. Hand written code and your brain is much better for that kind of optimization. Even the Intel compiler is terrible at SIMD optimization because its impossible to get the necessary context at compile time.
  4. Dranix macrumors 6502a


    Feb 26, 2011
    Gelnhausen, Germany
    Honestly, why care? All currently supported CPUs have at least SSE4.1, so simply compile for it.

    Or if you care you could simple use OpenCL with the CPU mode - The OpenCL compiler generates extremely nice sse-code.
  5. subsonix macrumors 68040

    Feb 2, 2008
    Depending on what it is you are doing, look into Apple's Accelerate framework, it will pick the best option depending on what hardware you are running on across all systems.
  6. MorphingDragon, Apr 6, 2014
    Last edited: Apr 6, 2014

    MorphingDragon macrumors 603


    Mar 27, 2009
    The World Inbetween
    Not always.

    AFAIK, OpenCL can't tell if there's array aliasing so only vector arithmetic is sped up, not loop optimization. There are some other issues like memory alignment.

    It depends on what he's trying to achieve. You shouldn't use loops in OpenCL anyway as it may run on the GPU if you just use the default device.

Share This Page