bp1000, dude, what about cuda-z snapshot I asked you earlier?
Core Information
----------------
Name: GeForce GTX 780M
Compute Capability: 3.0
Clock Rate: 784 MHz
PCI Location: 0:1:0
Multiprocessors: 8 (1536 Cores)
Therds Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Memory Information
------------------
Total Global: 4095.56 MiB
Bus Width: 256 bits
Clock Rate: 2500 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65536
Texture 3D Size: 4096 x 4096 x 4096
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: No
Async Engine: Yes, Bidirectional
Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 6266.61 MiB/s
Host Pageable to Device: 5378.67 MiB/s
Device to Host Pinned: 6331.24 MiB/s
Device to Host Pageable: 5139.21 MiB/s
Device to Device: 16.3972 GiB/s
GPU Core Performance
Single-precision Float: 1048.59 Gflop/s
Double-precision Float: 77.8095 Gflop/s
32-bit Integer: 320.241 Giop/s
24-bit Integer: 325.997 Giop/s