[kernel-xen] Greetings, Bug, and Broken Link, and Small Kernel Config Change Request

Sat Apr 27 20:06:14 EST 2013

Hi,

First of all, thank you very much for picking up the ball where RedHat 
dropped it WRT Xen EL6 kernels. I can relate to the frustration (I felt 
the same about the lack of EL6 for ARM and did something about it).

The mailing list link on the support page:

http://xen.crc.id.au/support/

appears to be broken - it is pointing at:

http://xen.crc.id.au/support/mailing_list/

which results in a 404.

I only found the link to the list with  bit of google-fu on your blog here:

https://www.crc.id.au/2013/03/17/libvirt-1-0-3-for-rhel6-centos-6-available-for-testing/

If the mailing list hasn't been growing of late, that _could_ be why. :)

Finally - I believe I have found a bug. The last version of 
xen-hypervisor where I had PCI/VGA passthrough working was 4.2.1-6.

The later versions result in error 22: invalid argument error when 
starting the VM. So:

Works:
xen-hypervisor-4.2.1-6.el6.x86_64.rpm

Don't work:
xen-hypervisor-4.2.1-7.el6.x86_64.rpm
xen-hypervisor-4.2.2-1.el6.x86_64.rpm

It is this specific package that seems to be responsible (/boot/xen.gz). 
I am running the rest of the xen packages of the latest 4.2.2-1 version.

Any idea what is going wrong here?

Finally - I would like to request the following change to the kernel 
configuration, if it wouldn't break things for too many people:

< CONFIG_NR_CPUS=8
 > CONFIG_NR_CPUS=32

This would make dom0 able to use all CPUs on dual 8-core/16-thread 
systems in dom0 if required. If this is deemed undesirable, the nr_cpus 
kernel boot parameter could be used to limit it (but this doesn't appear 
to work to increase the number of available cores past what is set in 
CONFIG_NR_CPUS. This had me scratching my head for a bit figuring out 
why I could only see 8 CPUs on my 24-thread machine.

< CONFIG_HOTPLUG_PCI_PCIE=y
 > CONFIG_HOTPLUG_PCI_PCIE=m

Unfortunately, there is some buggy hardware out there (including the 
EVGA SR-2 I'm running Xen on) that suffers from this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=908023
https://bugzilla.redhat.com/show_bug.cgi?id=529153

With pciehp built into the kernel, the only workaround I have found that 
works is as listed in the bug report, but it does involve manually 
editing init in the initramfs to drop the offending device out of pciehp 
binding at the earliest possible time.

With pciehp built as a module, it could either be blacklisted or handled 
in a way that ensures that it doesn't just kill the machine as soon as 
it starts, while at the same time still being available to people who 
actually need to use it.

Gordan