Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

DearthnVader

Suspended
Original poster
Dec 17, 2015
2,207
6,394
Red Springs, NC
So I wanted to toy around with my Quad and see if I could get IOMMU for PCIPassthrough to work on it, Sadly, tho there is some reference to IOMMU in dmesg and there is vfio-pci modules and they do load, I don't get any IOMMU_Groups:

Code:
sudo dmesg | grep -i -e DMAR -e IOMMU
[sudo] password for jam:
[    0.000000] DART IOMMU initialized for U4 type chipset
[    0.033492] IOMMU table initialized, virtual merging enabled
[    2.308755] sata_svw 0001:03:0c.0: Using 32-bit DMA via iommu
[    2.310276] tg3 0001:05:04.0: Using 32-bit DMA via iommu
[    2.347684] nouveau 0001:06:00.0: Using 32-bit DMA via iommu
[    2.373602] tg3 0001:05:04.1: Using 32-bit DMA via iommu
[    2.718196] nouveau 0001:06:00.0: Using 32-bit DMA via iommu

Code:
lsmod | grep vfio
vfio_pci               53474  0
vfio_virqfd             4812  1 vfio_pci
vfio_iommu_spapr_tce    16747  0
vfio_spapr_eeh          2994  2 vfio_iommu_spapr_tce,vfio_pci
vfio                   30544  2 vfio_iommu_spapr_tce,vfio_pci

Code:
ls /sys/kernel/iommu_groups
none

So, I suppose the U4 Chipset doesn't support IOMMU for PCIPassthrough, tho people were able to do something called PCI-Proxy with Mac-On-Linux. I have a pure 64bit kernel and userspace on my Quad, so no MOL here, but I do have a late 2005 Dual 2.3Ghz with Mate 16.04 installed that MOL works on, so I'll update later to see if I can get PCI-Proxy to work with MOL on that machine.

Anyway, I wondered what PowerPC machines had a chipset that supports IOMMU for PCIPassthrough, and I also wanted to test KVM on the Quad to see if I could pass all four cores to a virtual machine. Saddly the Mac99 machine in Qemu only supports one core, and even if we hack it to support more, Openbios only propagates one core in the device tree, and I haven't been able to successfully get Openbios to add more than one cpu to the device tree.

With that in mind, in Qemu the pseries machine type supports up to 4 cores, so I figured I play with that.




Code:
qemu-system-ppc64 --enable-kvm \
--machine pseries-2.1 -cpu host  \
-smp 4,cores=4,threads=1 \
-drive file=IBM.img,if=virtio \
-prom-env 'auto-boot?=false'\
-m 1024 -device ich9-usb-ehci2 \
-vga none -serial stdio

Note the fact that the pseries-2.1 does support IOMMU for PCIPassthrough:



Code:
Linux debian 4.19.0-4-powerpc64 #1 SMP Debian 4.19.28-2 (2019-03-15) ppc64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
jam@debian:~$ ./groups.sh
IOMMU Groups sys/kernel/iommu_groups/0:
    00:00.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 03)
    00:01.0 SCSI storage controller [0100]: Red Hat, Inc. Virtio block device [1af4:1001]

Code:
jam@debian:~$ sudo dmesg | grep -i -e DMAR -e IOMMU
[sudo] password for jam:
[    0.330824] IOMMU table initialized, virtual merging enabled
[    0.330974] iommu: Adding device 0000:00:00.0 to group 0
[    0.331085] iommu: Adding device 0000:00:01.0 to group 0

Code:
jam@debian:~$ cat /proc/cpuinfo
processor    : 0
cpu        : PPC970MP, altivec supported
clock        : 1250.000000MHz
revision    : 1.1 (pvr 0044 0101)

processor    : 1
cpu        : PPC970MP, altivec supported
clock        : 1250.000000MHz
revision    : 1.1 (pvr 0044 0101)

processor    : 2
cpu        : PPC970MP, altivec supported
clock        : 1250.000000MHz
revision    : 1.1 (pvr 0044 0101)

processor    : 3
cpu        : PPC970MP, altivec supported
clock        : 1250.000000MHz
revision    : 1.1 (pvr 0044 0101)

timebase    : 33333333
platform    : pSeries
model        : IBM pSeries (emulated by qemu)
machine        : CHRP IBM pSeries (emulated by qemu)
MMU        : Hash

Be nice to track down an old IBM Pseries server to play with.......
 
Last edited:
interesting stuff :)

I have wondered if IOMMU PCIe pass through was possible with the PCIe G5, I wonder if theres anyway to explicitly test/tell if it does or does not support IOMMU or if its just a software issue we are facing? here's the datasheet and user manual for the CPC945 (U4) chipset :)

http://datasheet.datasheetarchive.com/originals/library/Datasheets-SW3/DSASW0048084.pdf

https://datasheet.datasheetarchive.com/originals/library/Datasheets-SW3/DSASW0048026.pdf

on the PSeries front it might be worth making sure that a real PSeries still supports IOMMU and its not just something QEMU is fudging in there before spending money on one if possible
 
interesting stuff :)

I have wondered if IOMMU PCIe pass through was possible with the PCIe G5, I wonder if theres anyway to explicitly test/tell if it does or does not support IOMMU or if its just a software issue we are facing? here's the datasheet and user manual for the CPC945 (U4) chipset :)

http://datasheet.datasheetarchive.com/originals/library/Datasheets-SW3/DSASW0048084.pdf

https://datasheet.datasheetarchive.com/originals/library/Datasheets-SW3/DSASW0048026.pdf

on the PSeries front it might be worth making sure that a real PSeries still supports IOMMU and its not just something QEMU is fudging in there before spending money on one if possible

The Linux kernel documentation doesn't list any way to turn on or off IOMMU for the PowerPC platform, only one reference to iommu=nobypass, but it's not really clear what that does?

IOMMU seems to be active by default in PowerPC kernels.

Also, it's not real clear what machine the pseries-2.1 is emulating, and we know from the Mac machines that it's all sorts of bastardized hardware, just enough to get OS's running, not really tied 100% to any one real machine.

Interestingly, I did find two very reasonably priced Power 740 servers on Craigslist, it's just paying the shipping that would be a bitch.

https://reading.craigslist.org/sys/d/morgantown-ibm-power-740-server-t-8205/694835

Tho, it maybe worth the drive up to PA from NC to pick one up.:D
 
  • Like
Reactions: LightBulbFun

Some interesting stuff in there, first......
11.2.3 PLLsThe clocks in CPC945 are generated by on-chip PLLs. These PLLs are used for two functions; to multiply the input clock frequency to a higher frequency and to remove clock tree insertion delay on the interfaces where insertion delay is important. The four PLLs in CPC945 generate all the clocks for the chip. Each PLL is surrounded by a power management wrapper, which controls the PLL through reset, system sleep, and soft-ware-enabled dynamic power management. All four PLL[n] Control registers (which are a part of this wrapper) are accessible via memory-mapped accesses and I2C accesses. I2C access is required for the cases where the SPU must modify a PLL’s configuration before releasing the processors from reset.

Then.....

User ManualCPC945 Bridge and Memory ControllerPreliminaryPower Management and ClocksPage 322 of 655February 1, 200811.6 PLL Programming11.6.1 PLL1 and PLL2PLL1 and PLL2 both use the same type of PLL macro. These PLLs are used to generate the PI and DDR2 clocks respectively. Figure 11-16 illustrates the major blocks within the PI/DDR2 PLL.For the CPC945 application the feedback clock (FBCLK) is connected to the PLLOUTB pin of the PLL. This closes the PLL feedback loop and defines how the PLL dividers should be programmed for a given reference clock and target PLL output.To determine the VCO frequency, we use the following equation:VCO frequency = REFCLK x Feedback_Divider X Forward_DividerBThe resulting VCO frequency should be between 600 MHz and 1334 MHz. With the wide range of program-mability offered on the outputs, several integer and non-integer relationships can be realized between the PLLOUTA and PLLOUTB outputs by controlling the A and B dividers. The frequency relationship is given as: The PLLOUTA and PLLOUTB outputs are always synchronized to the rising edge (that is, the PLLOUTA rising edge coincides with the PLLOUTB rising edge at the start of the cycle). The PLLOUTC output is not used or connected in the CPC945.

So with a little tinkering on the I2C bus we should be able to yield a 1334Mhz system bus and 2,7Ghz CPU speed........................
 
  • Like
Reactions: LightBulbFun
Some fun stuff in Open Firmware on the Quad:

Code:
 .properties
name                    system-clock
device_type             system-clock
built-in               
reg                     000000d4 
compatible              smu-pulsar
                        pulsar
                        i2c-hwclock
                        hwclock
vcore-latency           00000064
#address-cells          00000001
#size-cells             00000000
sysclk-spreading-pll2-table 00001300 80808080 80808057 07070707 07868686
                        85858504 04048383 02020201 01805707 86868504 04830202
                        01805707 86850483 02018080 57868504 83018057 86850483
                        02805786 85830201 57868583 02015786 04830180 d6858301
                        80d60483 01578604 0280d685 83015785 83015786 040280d6
                        85830157 86040201 57850402 01578604 830180d6 85040201
                        80d68504 02018057 86048302 01805786 85048302 01805786
                        85048383 02018057 07868585 04838302 01018080 57078686
                        85858504 04838383 83020202 02010101 01010101 80808080
                        80808080 01010101 01010102 02020283 83838304 04858585
                        86860707 10800101 02838304 85858607 07100102
                        ... 000001a4 bytes total
sysclk-spreading-pll2-index 00001578 80b08f49
sysclk-spreading-pll4-table 00003300 80808080 80808080 57070707 07078686
                        86868685 85850404 04838383 02020101 80805707 86868585
                        04048302 02018080 57868685 04830202 01805786 85850483
                        02018057 86850483 02018057 86048302 01805785 04830280
                        57868583 02015786 04830280 57850402 01578604 83015786
                        04830157 86040280 57858301 80d60402 80d68583 01578583
                        01578583 01578583 01570402 80d60402 57858301 d6040280
                        d58301d6 04015785 0280d683 01570402 80d58301 57850280
                        d6040280 d6040280 d6830157 86040280 d6040280 d6048301
                        57858302 80d68583 01578604 83015786 04830180 d6858302
                        80578504 830180d6 85040201 80578504 83020157
                        ... 00000150 bytes total
sysclk-spreading-pll4-index 00003578 80b00bc4
 ok
0 > words

ether-clks-init slew-low        slew-high       set-frequency   set-voltage
slew-init       slew-wait       voltage-ih-1    voltage-ih-0    slewing-done
create-clkgen-property          create-slewing-properties
get-clock-4-profile             get-clock-3-profile
get-clock-2-profile             setup-clock-4-profile
setup-clock-3-profile           setup-clock-2-profile
setup-clock-profile             .cpu-freq       spread-4-init   spread-3-init
spread-2-init   spread-init     turn-off-4-spreading
turn-off-3-spreading            turn-off-2-spreading
turn-off-spreading              .clk-data       set-of-slew-points
set-mdiv        get-cur-fmin    get-cur-mdiv    get-mdiv
get-slew-point  calc-fmin       calc-mdiv       soft-reset
profile-slewing-4-setup?        profile-slewing-3-setup?
profile-slewing-2-setup?        profile-slewing-setup?
profile-spreading-4-setup?      profile-spreading-3-setup?
profile-spreading-2-setup?      profile-spreading-setup?
write-slewing-4-profile         write-slewing-3-profile
write-slewing-2-profile         write-slewing-profile
write-spreading-4-profile       write-spreading-3-profile
write-spreading-2-profile       write-spreading-profile
write-clk-4-profile             write-clk-3-profile
write-clk-2-profile             write-clk-profile               wait-for-lock
wait-for-operational            wait-for-pulsar close           open
decode-unit     write-reg       read-reg        write-clk-byte  read-clk-byte
eeprom@4        eeprom@         set-addr        setup-i2c       read-i2c-at2
read-i2c-at     read-i2c        write-i2c  ok
Also, the CPUs node has the set-dfs-high and set-dfs-low modes, but I don't think the 970MP supports Dynamic ( CPU ) Frequency Switching, only the 7447a and 7448( not used in any production Mac ).

On the ibook G4 and Powerbook G4 with 7447a cpus, set-dfs-high and set-dfs-low will change the speed of the CPU, seen with the spd word.

Tho this isn't the case with the Quad G5:

Code:
0 > dev /cpus/@0  ok
0 > words

translate-64    translate       modify          unmap           map
map-64          release         claim           set-dfs-low     set-dfs-high
spd             close           open  ok
0 > spd 44  pll-ratio*2 = 15  HID1 = fd3c2000  DelayAACK = 29b35c3  GPIO1 = 4 BusClk = 1250000000  ok
2 > set-dfs-low  ok
2 > spd 43  pll-ratio*2 = 15  HID1 = fd3c2000  DelayAACK = 29b35c3  GPIO1 = 4 BusClk = 1250000000  ok
4 > set-dfs-high  ok
4 > spd 79  pll-ratio*2 = 15  HID1 = fd3c2000  DelayAACK = 29b35c3  GPIO1 = 4 BusClk = 1250000000  ok
6 > see set-dfs-high
:
set-dfs-high             
  ^ff89.aec8  if
    5 1 gpio! 1 ms hid1@ 1 1f 9 - lshift andc hid1!
    then
  ; ok
6 > see set-dfs-low
:
set-dfs-low               
  ^ff89.aec8  if
    hid1@ 1 1f 9 - lshift or hid1! 1 ms 4 1 gpio!
    then
  ; ok

So I assume we'll have to figure out how to use the words from the I2C bus.

slew-low and slew-high seem pretty straight forward.
 
  • Like
Reactions: LightBulbFun
the 970 970FX and 970MP do support some form reducing clock speeds when workloads are idle

I just cant recall if this is done via reducing a multiplier or reudcing the Bus speed or a combination of both

(I know my 2Ghz 2004 PM7,3 will reduce its clock speed to 1304Mhz and my 2005 PM7,3 2.7Ghz will reduce its clock speed to 2Ghz, I sadly cant recall what my G5 Quad does, its been a good while since I have it out, I should unburry it at some point and play with it :) )
 
the 970 970FX and 970MP do support some form reducing clock speeds when workloads are idle

I just cant recall if this is done via reducing a multiplier or reudcing the Bus speed or a combination of both

(I know my 2Ghz 2004 PM7,3 will reduce its clock speed to 1304Mhz and my 2005 PM7,3 2.7Ghz will reduce its clock speed to 2Ghz, I sadly cant recall what my G5 Quad does, its been a good while since I have it out, I should unburry it at some point and play with it :) )

Fairly sure G5's reduce the system speed via bus slewing( slew-low ).

We know the 970fx supports 3:1 multiplier, but I doubt that can be set via software.
 
  • Like
Reactions: LightBulbFun
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.