I did a comparison of two identical EVGA GTX 570 2.5gb versions with CUDA-Z. One of the cards had the EFI boot rom modification performed by MacVidCards. The computer was the same, just swapped the card and ran the CUDA-Z test, Xbench and a subjective test playing back a Red 5k timeline in Adobe Premier. 
The cards scored identically on everything except for the Memory Copy Device to Host tests, where the flashed card was about twice as fast. I am assuming this is because the flashed card runs at double PCI speed, but I am not an expert at these things.
We tried playing back a premier timeline that was 5K Red footage with effects at various resolutions but both cards "felt" identical, (neither played full res). The flashed card didn't subjectively seem any faster. If anyone has an idea how to stress the cards to show the difference let me know and I will try to redo the test.
The OpenCL fix (libclh.dylib) made everything much slower, even things like finder window redraws so we uninstalled it. To be honest I don't fully understand the OpenCL fix.
CUDA-Z Report Stock Card
=============
Version: 0.6.133 SVN Built Jun 25 2010 23:28:46 
http://cuda-z.sourceforge.net/
OS Version: Mac OS X 10.8 12A269
Driver Version: 8.0.51 295.30.00f01
Driver Dll Version: 5.0
Runtime Dll Version: 3.0
Core Information
----------------
	Name: GeForce GTX 570
	Compute Capability: 2.0
	Clock Rate: 1464 MHz
	Multiprocessors: 15
	Warp Size: 32
	Regs Per Block: 32768
	Threads Per Block: 1024
	Threads Dimensions: 1024 x 1024 x 64
	Grid Dimensions: 65535 x 65535 x 65535
	Watchdog Enabled: Yes
	Integrated GPU: No
	Concurrent Kernels: Yes
	Compute Mode: Default
Memory Information
------------------
	Total Global: 2559.69 MiB
	Shared Per Block: 48 KiB
	Pitch: 2048 MiB
	Total Constant: 64 KiB
	Texture Alignment: 512 B
	Texture 1D Size: 65536
	Texture 2D Size: 65536 x 65535
	Texture 3D Size: 2048 x 2048 x 2048
	GPU Overlap: Yes
	Map Host Memory: Yes
	Error Correction: No
Performance Information
-----------------------
Memory Copy
	Host Pinned to Device: 2946.53 MiB/s
	Host Pageable to Device: 2791.7 MiB/s
	Device to Host Pinned: 2958.62 MiB/s
	Device to Host Pageable: 2799.05 MiB/s
	Device to Device: 59.5438 GiB/s
GPU Core Performance
	Single-precision Float: 1392.71 Gflop/s
	Double-precision Float: 175.572 Gflop/s
	32-bit Integer: 700.618 Giop/s
	24-bit Integer: 699.786 Giop/s
Generated: Thu Aug 16 20:09:10 2012
CUDA-Z Report EFI Flashed MacVidCards card
=============
Version: 0.6.133 SVN Built Jun 25 2010 23:28:46 
http://cuda-z.sourceforge.net/
OS Version: Mac OS X 10.8 12A269
Driver Version: 8.0.51 295.30.00f01
Driver Dll Version: 5.0
Runtime Dll Version: 3.0
Core Information
----------------
	Name: GeForce GTX 570
	Compute Capability: 2.0
	Clock Rate: 1464 MHz
	Multiprocessors: 15
	Warp Size: 32
	Regs Per Block: 32768
	Threads Per Block: 1024
	Threads Dimensions: 1024 x 1024 x 64
	Grid Dimensions: 65535 x 65535 x 65535
	Watchdog Enabled: Yes
	Integrated GPU: No
	Concurrent Kernels: Yes
	Compute Mode: Default
Memory Information
------------------
	Total Global: 2559.56 MiB
	Shared Per Block: 48 KiB
	Pitch: 2048 MiB
	Total Constant: 64 KiB
	Texture Alignment: 512 B
	Texture 1D Size: 65536
	Texture 2D Size: 65536 x 65535
	Texture 3D Size: 2048 x 2048 x 2048
	GPU Overlap: Yes
	Map Host Memory: Yes
	Error Correction: No
Performance Information
-----------------------
Memory Copy
	Host Pinned to Device: 5681.18 MiB/s
	Host Pageable to Device: 3533.28 MiB/s
	Device to Host Pinned: 5679.95 MiB/s
	Device to Host Pageable: 3529.13 MiB/s
	Device to Device: 58.8025 GiB/s
GPU Core Performance
	Single-precision Float: 1387.29 Gflop/s
	Double-precision Float: 175.602 Gflop/s
	32-bit Integer: 700.318 Giop/s
	24-bit Integer: 699.442 Giop/s
Generated: Thu Aug 16 20:48:01 2012