古早gem5源码学习

主要是关于gem5的内存部分的学习记录,记录比较粗糙,请勿在意。

MemoryObject继承了ClockedObjectClockedObject继承了SimObjectSimObject是最上层的抽象类,可以代表所有的物理部件,并且可以通过配置文件进行配置。

SimObject initialization is controlled by the instantiate method in src/python/m5/simulate.py. There are slightly different initialization paths when starting the simulation afresh and when loading from a checkpoint. After instantiation and connecting ports, simulate.py initializes the object using the following call sequence:

  1. SimObject::init()
  2. SimObject::regStats()
    • SimObject::initState() if starting afresh.
    • SimObject::loadState() if restoring from a checkpoint.
  3. SimObject::resetStats()
  4. SimObject::startup()
  5. Drainable::drainResume() if resuming from a checkpoint.

ClockedObject增加了时钟和与计时有关的继承函数,能够记录对象的周期。

MemObject则是一个提供了获取masterslave端口的,同时有时钟周期的对象。当然,MemObject本身就是一个非常抽象的类,不能够直接使用。

AbstractMemory继承了MemObject,它是不同内存实现的最近的父类。它代表了连续的物理内存快,有相关联的地址范围,提供了基本的读写内存功能。它至少有一个slave端口。其中有两个获取内存的方法,一个是access一个是functionalAccess,前者需要根据对应的地址更新内存的状态,后者并不会改变内存本身,同时也不会统计这一方法的数据。显然,access是更为复杂的方法。下面介绍access的基本思路:

  1. access传入一个类型为PacketPtr的数据包指针,里面包含了所有必须的请求内容。
  2. 检查该请求作用的地址范围是否是该内存的子集。
  3. 检查该数据包的MEM_INHIBIT位是否为1,若为0,则无法使用access方法进行操作。
  4. hostAddr计算了需要作用内存的物理地址。
  5. 数据包可能有四种情况,其中三种情况是有效的:
    1. 若请求数据包的类型是MemCmd::SwapReq,一个数据交换的请求。将对应物理地址储存的数据放入数据包中,将数据包中储存的数据放入临时变量overwrite_val中。在交换数据中,还有两种区别。一种是直接交换,一种是如果内存对应位置的数据如果和condition_val相同,才进行交换。交换操作结束后,会检查该请求不是获取指令的请求。并根据请求的masterIdnumOther对应的计数器自增1。
    2. 若请求数据包的类型是读请求,首先判断该请求不是写请求。若该数据包是LLSC类型的,还需要调用trackLoadLocked方法,将该请求加入lockedAddrList中。然后将对应地址储存的数据写入数据包中,将numReads对应的masterId自增1,并记录读取的字节数。如果是获取指令的请求,还需要增加获取指令的字节数。
    3. 如果请求数据包是写请求,首先查看是否可以进行写操作(该请求是否在lockedAddrList中,如果在列表中是否又能写入),写入完成后再进行同上的数据记录。
  6. 如果这个数据包需要回复,那么还需要填入回复数据的内容。

现在,这一部分的内容即将丢弃!

Full System架构解析

我即将开始阅读fs.py的代码!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Add options
parser = optparse.OptionParser()
Options.addCommonOptions(parser)
Options.addFSOptions(parser)


# Add the ruby specific and protocol specific options
if '--ruby' in sys.argv:
Ruby.define_options(parser)

(options, args) = parser.parse_args()

if args:
print("Error: script doesn't take any positional arguments")
sys.exit(1)

创建了一个parser类,添加了gem5基本的配置以及FS模式下配置的所有选项。如果在构建系统时需要使用ruby,还需要在Ruby类中添加预定义的选项。然后使用parse_args方法返回对应的options和args。

1
2
3
4
5
# system under test can be any CPU
(TestCPUClass, test_mem_mode, FutureClass) = Simulation.setCPUClass(options)

# Match the memories with the CPUs, based on the options for the test system
TestMemClass = Simulation.setMemClass(options)

setCPUClass会返回两个CPU类型,一个是TestCPUClass,一个是FutureClass。如果在options中有设置了checkpoint_restore或者fast_forward,会将options中指定的cpu_type存入FutureClass中,TestCPUClass为对应的选项指定的类型。一般情况下,test_mem_mode的类型与TestCPUClass相对应。

setMemClass返回了一个内存控制器类型。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
if options.benchmark:
try:
bm = Benchmarks[options.benchmark]
except KeyError:
print("Error benchmark %s has not been defined." % options.benchmark)
print("Valid benchmarks are: %s" % DefinedBenchmarks)
sys.exit(1)
else:
if options.dual:
bm = [SysConfig(disks=options.disk_image, rootdev=options.root_device,
mem=options.mem_size, os_type=options.os_type),
SysConfig(disks=options.disk_image, rootdev=options.root_device,
mem=options.mem_size, os_type=options.os_type)]
else:
bm = [SysConfig(disks=options.disk_image, rootdev=options.root_device,
mem=options.mem_size, os_type=options.os_type)]

如果options中带有benchmark,则寻找对应的测试。如果没有指定,那么直接创建默认的benchmark。

1
2
3
np = options.num_cpus

test_sys = build_test_system(np)

根据指定的cpu数量创建对应的测试系统。

下面介绍build_test_system函数。它根据options构建了对应的测试系统。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def build_test_system(np):
cmdline = cmd_line_template()
if buildEnv['TARGET_ISA'] == "mips":
test_sys = makeLinuxMipsSystem(test_mem_mode, bm[0], cmdline=cmdline)
elif buildEnv['TARGET_ISA'] == "sparc":
test_sys = makeSparcSystem(test_mem_mode, bm[0], cmdline=cmdline)
elif buildEnv['TARGET_ISA'] == "riscv":
test_sys = makeBareMetalRiscvSystem(test_mem_mode, bm[0],
cmdline=cmdline)
elif buildEnv['TARGET_ISA'] == "x86":
test_sys = makeLinuxX86System(test_mem_mode, np, bm[0], options.ruby,
cmdline=cmdline)
elif buildEnv['TARGET_ISA'] == "arm":
test_sys = makeArmSystem(
test_mem_mode,
options.machine_type,
np,
bm[0],
options.dtb_filename,
bare_metal=options.bare_metal,
cmdline=cmdline,
external_memory=options.external_memory_system,
ruby=options.ruby,
security=options.enable_security_extensions,
vio_9p=options.vio_9p,
bootloader=options.bootloader,
)
if options.enable_context_switch_stats_dump:
test_sys.enable_context_switch_stats_dump = True
else:
fatal("Incapable of building %s full system!", buildEnv['TARGET_ISA'])
...

首先,根据buildEnv构建指定的架构系统。由于我需要构建的是x86系统,所以再来查看makeLinuxX86System函数。由于在makeLinuxX86System中首先调用了makeX86System,所以首先先介绍该函数。由于在调用该函数时,workload指定为由x86FSLinux产生的,如果workload为空,则直接使用X86FSWorkload生成;如果mdesc为空,那么调用SysConfig构建基本的系统配置(当然在full system的设置中已经指定好了系统配置)。

1
2
3
4
5
6
7
8
9
10
11
12
13
def makeX86System(mem_mode, numCPUs=1, mdesc=None, workload=None, Ruby=False):
self = System()

if workload is None:
workload = X86FsWorkload()
self.workload = workload

if not mdesc:
# generic system
mdesc = SysConfig()
self.readfile = mdesc.script()

self.mem_mode = mem_mode

由于在X86平台上内存的[0xC0000000, 0xFFFFFFFF]区间将会预留给设备使用,因此当物理内存大于3GB时,将会将物理内存氛围两部分。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Physical memory
# On the PC platform, the memory region 0xC0000000-0xFFFFFFFF is reserved
# for various devices. Hence, if the physical memory size is greater than
# 3GB, we need to split it into two parts.
excess_mem_size = \
convert.toMemorySize(mdesc.mem()) - convert.toMemorySize('3GB')
if excess_mem_size <= 0:
self.mem_ranges = [AddrRange(mdesc.mem())]
else:
warn("Physical memory size specified is %s which is greater than " \
"3GB. Twice the number of memory controllers would be " \
"created." % (mdesc.mem()))

self.mem_ranges = [AddrRange('3GB'),
AddrRange(Addr('4GB'), size = excess_mem_size)]

Pc()是在x86架构中platform,其中包含了指向系统的指针,以及一个南桥指针。顺带一提,在src/dev/x86中有x86必要的几个芯片实现。南桥模块将这些芯片连接起来。

1
2
# Platform
self.pc = Pc()

内存系统创建

然后如果设置了Ruby,则会创建带有Ruby的X86内存系统。如果没有设置,就会创建传统的X86内存系统。这里首先来看传统内存系统。

1
2
3
4
5
# Create and connect the busses required by each memory system
if Ruby:
connectX86RubySystem(self)
else:
connectX86ClassicSystem(self, numCPUs)

可以看到一个基本X86系统的大致架构。IO地址空间从0x8000000000000000起,pci配置地址空间由0xc000000000000000起,中断地址空间由0xa000000000000000起。然后创建有关x86系统的内存总线。

总结一下,数据信息的传输过程是membus->bridge->ious,iobus的master(mem_side_ports)端口与很多设备相连,同时也会定了bridge能够接受的地址范围。然后又在系统中创建了apicbrige,支持数据从iobus->apicbridge->membus的流程,该桥只能支持中断基地址到cpu数乘APIC范围的支持。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def connectX86ClassicSystem(x86_sys, numCPUs):
# Constants similar to x86_traits.hh
IO_address_space_base = 0x8000000000000000
pci_config_address_space_base = 0xc000000000000000
interrupts_address_space_base = 0xa000000000000000
APIC_range_size = 1 << 12;

x86_sys.membus = MemBus()

# North Bridge
x86_sys.iobus = IOXBar()
x86_sys.bridge = Bridge(delay='50ns')
x86_sys.bridge.mem_side_port = x86_sys.iobus.cpu_side_ports
x86_sys.bridge.cpu_side_port = x86_sys.membus.mem_side_ports
# Allow the bridge to pass through:
# 1) kernel configured PCI device memory map address: address range
# [0xC0000000, 0xFFFF0000). (The upper 64kB are reserved for m5ops.)
# 2) the bridge to pass through the IO APIC (two pages, already contained in 1),
# 3) everything in the IO address range up to the local APIC, and
# 4) then the entire PCI address space and beyond.
x86_sys.bridge.ranges = \
[
AddrRange(0xC0000000, 0xFFFF0000),
AddrRange(IO_address_space_base,
interrupts_address_space_base - 1),
AddrRange(pci_config_address_space_base,
Addr.max)
]

# Create a bridge from the IO bus to the memory bus to allow access to
# the local APIC (two pages)
x86_sys.apicbridge = Bridge(delay='50ns')
x86_sys.apicbridge.cpu_side_port = x86_sys.iobus.mem_side_ports
x86_sys.apicbridge.mem_side_port = x86_sys.membus.cpu_side_ports
x86_sys.apicbridge.ranges = [AddrRange(interrupts_address_space_base,
interrupts_address_space_base +
numCPUs * APIC_range_size
- 1)]

# connect the io bus
x86_sys.pc.attachIO(x86_sys.iobus)

x86_sys.system_port = x86_sys.membus.cpu_side_ports
1
self.intrctrl = IntrControl() # 最新的gem5已经删除
1
2
3
# Disks
disks = makeCowDisks(mdesc.disks())
self.pc.south_bridge.ide.disks = disks
1
2
3
# Add in a Bios information structure.
structures = [X86SMBiosBiosInformation()]
workload.smbios_table.structures = structures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Set up the Intel MP table
base_entries = []
ext_entries = []
for i in range(numCPUs):
bp = X86IntelMPProcessor(
local_apic_id = i,
local_apic_version = 0x14,
enable = True,
bootstrap = (i == 0))
base_entries.append(bp)
io_apic = X86IntelMPIOAPIC(
id = numCPUs,
version = 0x11,
enable = True,
address = 0xfec00000)
self.pc.south_bridge.io_apic.apic_id = io_apic.id
base_entries.append(io_apic)
# In gem5 Pc::calcPciConfigAddr(), it required "assert(bus==0)",
# but linux kernel cannot config PCI device if it was not connected to
# PCI bus, so we fix PCI bus id to 0, and ISA bus id to 1.
pci_bus = X86IntelMPBus(bus_id = 0, bus_type='PCI ')
base_entries.append(pci_bus)
isa_bus = X86IntelMPBus(bus_id = 1, bus_type='ISA ')
base_entries.append(isa_bus)
connect_busses = X86IntelMPBusHierarchy(bus_id=1,
subtractive_decode=True, parent_bus=0)
ext_entries.append(connect_busses)
pci_dev4_inta = X86IntelMPIOIntAssignment(
interrupt_type = 'INT',
polarity = 'ConformPolarity',
trigger = 'ConformTrigger',
source_bus_id = 0,
source_bus_irq = 0 + (4 << 2),
dest_io_apic_id = io_apic.id,
dest_io_apic_intin = 16)
base_entries.append(pci_dev4_inta)
def assignISAInt(irq, apicPin):
assign_8259_to_apic = X86IntelMPIOIntAssignment(
interrupt_type = 'ExtInt',
polarity = 'ConformPolarity',
trigger = 'ConformTrigger',
source_bus_id = 1,
source_bus_irq = irq,
dest_io_apic_id = io_apic.id,
dest_io_apic_intin = 0)
base_entries.append(assign_8259_to_apic)
assign_to_apic = X86IntelMPIOIntAssignment(
interrupt_type = 'INT',
polarity = 'ConformPolarity',
trigger = 'ConformTrigger',
source_bus_id = 1,
source_bus_irq = irq,
dest_io_apic_id = io_apic.id,
dest_io_apic_intin = apicPin)
base_entries.append(assign_to_apic)
assignISAInt(0, 2)
assignISAInt(1, 1)
for i in range(3, 15):
assignISAInt(i, i)
workload.intel_mp_table.base_entries = base_entries
workload.intel_mp_table.ext_entries = ext_entries

return self
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def makeLinuxX86System(mem_mode, numCPUs=1, mdesc=None, Ruby=False,
cmdline=None):
# Build up the x86 system and then specialize it for Linux
self = makeX86System(mem_mode, numCPUs, mdesc, X86FsLinux(), Ruby)

# We assume below that there's at least 1MB of memory. We'll require 2
# just to avoid corner cases.
phys_mem_size = sum([r.size() for r in self.mem_ranges])
assert(phys_mem_size >= 0x200000)
assert(len(self.mem_ranges) <= 2)

entries = \
[
# Mark the first megabyte of memory as reserved
X86E820Entry(addr = 0, size = '639kB', range_type = 1),
X86E820Entry(addr = 0x9fc00, size = '385kB', range_type = 2),
# Mark the rest of physical memory as available
X86E820Entry(addr = 0x100000,
size = '%dB' % (self.mem_ranges[0].size() - 0x100000),
range_type = 1),
]

# Mark [mem_size, 3GB) as reserved if memory less than 3GB, which force
# IO devices to be mapped to [0xC0000000, 0xFFFF0000). Requests to this
# specific range can pass though bridge to iobus.
if len(self.mem_ranges) == 1:
entries.append(X86E820Entry(addr = self.mem_ranges[0].size(),
size='%dB' % (0xC0000000 - self.mem_ranges[0].size()),
range_type=2))

# Reserve the last 16kB of the 32-bit address space for the m5op interface
entries.append(X86E820Entry(addr=0xFFFF0000, size='64kB', range_type=2))

# In case the physical memory is greater than 3GB, we split it into two
# parts and add a separate e820 entry for the second part. This entry
# starts at 0x100000000, which is the first address after the space
# reserved for devices.
if len(self.mem_ranges) == 2:
entries.append(X86E820Entry(addr = 0x100000000,
size = '%dB' % (self.mem_ranges[1].size()), range_type = 1))

self.workload.e820_table.entries = entries

# Command line
if not cmdline:
cmdline = 'earlyprintk=ttyS0 console=ttyS0 lpj=7999923 root=/dev/hda1'
self.workload.command_line = fillInCmdline(mdesc, cmdline)
return self
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# Set the cache line size for the entire system
test_sys.cache_line_size = options.cacheline_size

# Create a top-level voltage domain
test_sys.voltage_domain = VoltageDomain(voltage = options.sys_voltage)

# Create a source clock for the system and set the clock period
test_sys.clk_domain = SrcClockDomain(clock = options.sys_clock,
voltage_domain = test_sys.voltage_domain)

# Create a CPU voltage domain
test_sys.cpu_voltage_domain = VoltageDomain()

# Create a source clock for the CPUs and set the clock period
test_sys.cpu_clk_domain = SrcClockDomain(clock = options.cpu_clock,
voltage_domain =
test_sys.cpu_voltage_domain)

if buildEnv['TARGET_ISA'] == 'riscv':
test_sys.workload.bootloader = options.kernel
elif options.kernel is not None:
test_sys.workload.object_file = binary(options.kernel)

if options.script is not None:
test_sys.readfile = options.script

if options.lpae:
test_sys.have_lpae = True

if options.virtualisation:
test_sys.have_virtualization = True

test_sys.init_param = options.init_param

# For now, assign all the CPUs to the same clock domain
test_sys.cpu = [TestCPUClass(clk_domain=test_sys.cpu_clk_domain, cpu_id=i)
for i in range(np)]

if ObjectList.is_kvm_cpu(TestCPUClass) or \
ObjectList.is_kvm_cpu(FutureClass):
test_sys.kvm_vm = KvmVM()

if options.ruby:
bootmem = getattr(test_sys, '_bootmem', None)
Ruby.create_system(options, True, test_sys, test_sys.iobus,
test_sys._dma_ports, bootmem)

# Create a seperate clock domain for Ruby
test_sys.ruby.clk_domain = SrcClockDomain(clock = options.ruby_clock,
voltage_domain = test_sys.voltage_domain)

# Connect the ruby io port to the PIO bus,
# assuming that there is just one such port.
test_sys.iobus.master = test_sys.ruby._io_port.slave

for (i, cpu) in enumerate(test_sys.cpu):
#
# Tie the cpu ports to the correct ruby system ports
#
cpu.clk_domain = test_sys.cpu_clk_domain
cpu.createThreads()
cpu.createInterruptController()

cpu.icache_port = test_sys.ruby._cpu_ports[i].slave
cpu.dcache_port = test_sys.ruby._cpu_ports[i].slave

if buildEnv['TARGET_ISA'] in ("x86", "arm"):
cpu.itb.walker.port = test_sys.ruby._cpu_ports[i].slave
cpu.dtb.walker.port = test_sys.ruby._cpu_ports[i].slave

if buildEnv['TARGET_ISA'] in "x86":
cpu.interrupts[0].pio = test_sys.ruby._cpu_ports[i].master
cpu.interrupts[0].int_master = test_sys.ruby._cpu_ports[i].slave
cpu.interrupts[0].int_slave = test_sys.ruby._cpu_ports[i].master

else:
if options.caches or options.l2cache:
# By default the IOCache runs at the system clock
test_sys.iocache = IOCache(addr_ranges = test_sys.mem_ranges)
test_sys.iocache.cpu_side = test_sys.iobus.master
test_sys.iocache.mem_side = test_sys.membus.slave
elif not options.external_memory_system:
test_sys.iobridge = Bridge(delay='50ns', ranges = test_sys.mem_ranges)
test_sys.iobridge.slave = test_sys.iobus.master
test_sys.iobridge.master = test_sys.membus.slave

# Sanity check
if options.simpoint_profile:
if not ObjectList.is_noncaching_cpu(TestCPUClass):
fatal("SimPoint generation should be done with atomic cpu")
if np > 1:
fatal("SimPoint generation not supported with more than one CPUs")

for i in range(np):
if options.simpoint_profile:
test_sys.cpu[i].addSimPointProbe(options.simpoint_interval)
if options.checker:
test_sys.cpu[i].addCheckerCpu()
if not ObjectList.is_kvm_cpu(TestCPUClass):
if options.bp_type:
bpClass = ObjectList.bp_list.get(options.bp_type)
test_sys.cpu[i].branchPred = bpClass()
if options.indirect_bp_type:
IndirectBPClass = ObjectList.indirect_bp_list.get(
options.indirect_bp_type)
test_sys.cpu[i].branchPred.indirectBranchPred = \
IndirectBPClass()
test_sys.cpu[i].createThreads()

# If elastic tracing is enabled when not restoring from checkpoint and
# when not fast forwarding using the atomic cpu, then check that the
# TestCPUClass is DerivO3CPU or inherits from DerivO3CPU. If the check
# passes then attach the elastic trace probe.
# If restoring from checkpoint or fast forwarding, the code that does this for
# FutureCPUClass is in the Simulation module. If the check passes then the
# elastic trace probe is attached to the switch CPUs.
if options.elastic_trace_en and options.checkpoint_restore == None and \
not options.fast_forward:
CpuConfig.config_etrace(TestCPUClass, test_sys.cpu, options)

CacheConfig.config_cache(options, test_sys)

MemConfig.config_mem(options, test_sys)

return test_sys

dev的x86解析

x86的dev整体是由Pc来构建的,即pc代表的是x86 platform,继承了Platform类。包含了一个system指针,南桥指针,以及中断控制器。其中有四种方法:发出终端中断、清除终端中断、发出PCI中断和清除PCI中断。可以看到,它创建了三个为了支持linux内核的计时功能的不存在的端口,地址分别为0x80, 0xed和0xcf8(当然,它们都需要加上0x80000000)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class Pc(Platform):
type = 'Pc'
cxx_header = "dev/x86/pc.hh"
system = Param.System(Parent.any, "system")

south_bridge = SouthBridge()
pci_host = PcPciHost()

# "Non-existant" ports used for timing purposes by the linux kernel
i_dont_exist1 = IsaFake(pio_addr=x86IOAddress(0x80), pio_size=1)
i_dont_exist2 = IsaFake(pio_addr=x86IOAddress(0xed), pio_size=1)

# Ports behind the pci config and data regsiters. These don't do anything,
# but the linux kernel fiddles with them anway.
behind_pci = IsaFake(pio_addr=x86IOAddress(0xcf8), pio_size=8)

# Serial port and terminal
com_1 = Uart8250()
com_1.pio_addr = x86IOAddress(0x3f8)
com_1.device = Terminal()

# Devices to catch access to non-existant serial ports.
fake_com_2 = IsaFake(pio_addr=x86IOAddress(0x2f8), pio_size=8)
fake_com_3 = IsaFake(pio_addr=x86IOAddress(0x3e8), pio_size=8)
fake_com_4 = IsaFake(pio_addr=x86IOAddress(0x2e8), pio_size=8)

# A device to catch accesses to the non-existant floppy controller.
fake_floppy = IsaFake(pio_addr=x86IOAddress(0x3f2), pio_size=2)

def attachIO(self, bus, dma_ports = []):
self.south_bridge.attachIO(bus, dma_ports)
self.i_dont_exist1.pio = bus.master
self.i_dont_exist2.pio = bus.master
self.behind_pci.pio = bus.master
self.com_1.pio = bus.master
self.fake_com_2.pio = bus.master
self.fake_com_3.pio = bus.master
self.fake_com_4.pio = bus.master
self.fake_floppy.pio = bus.master
self.pci_host.pio = bus.default

在实例化对应的IO设备后,需要连接至IO总线端。bus我猜测应该是IOXBar,它继承了NonCoherentXbar,同时是BaseXBar孩子。能够看到bus.master指的是mem_side_portspci_host会连接到默认端口。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class BaseXBar(ClockedObject):
type = 'BaseXBar'
abstract = True
cxx_header = "mem/xbar.hh"
cxx_class = 'gem5::BaseXBar'

cpu_side_ports = VectorResponsePort("Vector port for connecting "
"mem side ports")
slave = DeprecatedParam(cpu_side_ports,
'`slave` is now called `cpu_side_ports`')
mem_side_ports = VectorRequestPort("Vector port for connecting "
"cpu side ports")
master = DeprecatedParam(mem_side_ports,
'`master` is now called `mem_side_ports`')
...
# The default port can be left unconnected, or be used to connect
# a default response port
default = RequestPort("Port for connecting an optional default responder")

gem5的HMC部分

hmc位于src/mem部分。

HMC.py构建了一个完整的HMC设备,包含了valut controllers, serial links, main internal crossbar和一个external hmc controller。

  • vault controller:是HMC_2500_1x32并拥有dram控制器的实例类,同时拥有dram_ctrl.cc定义的功能。
  • main xbar:是一个简单的NoncoherentXBar的实例类。
  • serial links controller:SerialLink是Bridge类的简单变体,能够计算packet serialization的延迟和controller latency。我们假设serializer component在transmitter side不需要收到整个包裹来开始线性化过程。然而反线性化需要等待整个包裹的到来,首先检查它的一致性。
    • Bandwidth of the serial links is not modeled in the SerialLink component itself
    • serial link controller的延迟由SerDes latency + link controller组成
    • serial links分享相同的地址范围和数据包,因此它们需要load distribution机制。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-----------------------------------------
| Host/HMC Controller |
| ---------------------- |
| | Link Aggregator | opt |
| ---------------------- |
| ---------------------- |
| | Serial Link + Ser | * 4 |
| ---------------------- |
|---------------------------------------|
-----------------------------------------
| Device |
| ---------------------- |
| | Xbar | * 4 |
| ---------------------- |
| ---------------------- |
| | Vault Controller | * 16 |
| ---------------------- |
| ---------------------- |
| | Memory | |
| ---------------------- |
|---------------------------------------|