PA1
开天辟地的篇章: 最简单的计算机
大致了解上述的目录树之后, 你就可以开始阅读代码了. 至于从哪里开始, 就不用多费口舌了吧.
main 函数!
There are two ways to use ccache. You can either prefix your compilation commands with ccache or you can let ccache masquerade as the compiler by creating a symbolic link (named as the compiler) to ccache. The first method is most convenient if you just want to try out ccache or wish to use it for some specific projects. The second method is most useful for when you wish to use ccache for all your compilations.
To use the first method, just make sure that ccache is in your PATH.
To use the second method on a Debian system, it’s easiest to just prepend /usr/lib/ccache to your PATH. /usr/lib/ccache contains symlinks for all compilers currently installed as Debian packages.
Alternatively, you can create any symlinks you like yourself like this:
ln -s /usr/bin/ccache /usr/local/bin/gcc ln -s /usr/bin/ccache /usr/local/bin/g++ ln -s /usr/bin/ccache /usr/local/bin/cc ln -s /usr/bin/ccache /usr/local/bin/c++
And so forth. This will work as long as the directory with symlinks comes before the path to the compiler (which is usually in /usr/bin). After installing you may wish to run “which gcc” to make sure that the correct link is being used.
Warning
The technique of letting ccache masquerade as the compiler works well, but currently doesn’t interact well with other tools that do the same thing. See USING CCACHE WITH OTHER COMPILER WRAPPERS.
Warning
Use a symbolic links for masquerading, not hard links.
So I add the following line to my ~/.bashrc, it prepend /usr/lib/ccache to my PATH
.
export PATH=:/usr/lib/ccache:$PATH
If you want this works for tmux
, also add the line to ~/.bash_profile or other files read by your login shell.
Use source ~/.bashrc
to reload it, then use which gcc
I found it works successfully:
/usr/lib/ccache/gcc
[Warning] ⚠ According to the manual, you need to change your PATH
exports to prepend not postfix
exmaple of postfix:
export PATH=$PATH:/usr/lib/ccache
I will explain why below
RTFSC
配置系统和项目构建
配置系统kconfig
当你键入
make menuconfig
的时候, 背后其实发生了如下事件:
检查
nemu/tools/kconfig/build/mconf
程序是否存在, 若不存在, 则编译并生成mconf
检查
nemu/tools/kconfig/build/conf
程序是否存在, 若不存在, 则编译并生成conf
运行命令
mconf nemu/Kconfig
, 此时mconf
将会解析nemu/Kconfig
中的描述, 以菜单树的形式展示各种配置选项, 供开发者进行选择退出菜单时,
mconf
会把开发者选择的结果记录到nemu/.config
文件中运行命令
conf --syncconfig nemu/Kconfig
, 此时conf
将会解析nemu/Kconfig
中的描述, 并读取选择结果nemu/.config
, 结合两者来生成如下文件:
可以被包含到C代码中的宏定义(
nemu/include/generated/autoconf.h
), 这些宏的名称都是形如CONFIG_xxx
的形式可以被包含到Makefile中的变量定义(
nemu/include/config/auto.conf
)可以被包含到Makefile中的, 和"配置描述文件"相关的依赖规则(
nemu/include/config/auto.conf.cmd
), 为了阅读代码, 我们可以不必关心它通过时间戳来维护配置选项变化的目录树
nemu/include/config/
, 它会配合另一个工具nemu/tools/fixdep
来使用, 用于在更新配置选项后节省不必要的文件编译, 为了阅读代码, 我们可以不必关心它
mconf
程序展示配置选项并将开发者选择的结果记录到 .config 文件中conf
程序则根据解析 Kconfig 的描述与 .config 中的选项来生成一些宏与 Makefile 中变量定义、依赖规则等,以及维护配置选项变化的目录树
项目构建和 Makefile
NEMU 的 Makefile 具备如下功能:
与配置系统进行关联 -- 通过包含
nemu/include/config/auto.conf
, 与 kconfig 生成的变量进行关联. 因此在通过 menuconfig 更新配置选项后, Makefile 的行为可能也会有所变化.文件列表 (filelist) - - 通过文件列表 (filelist) 决定最终参与编译的源文件. 在
nemu/src
及其子目录下存在一些名为filelist.mk
的文件, 它们会根据 menuconfig 的配置维护一些变量,最终 Makefile 汇总得到参与编译的源文件的集合编译和链接
第一个客户程序
NEMU在开始运行的时候, 首先会调用
init_monitor()
函数(在nemu/src/monitor/monitor.c
中定义) 来进行一些和monitor相关的初始化工作.
src/nemu-main.c 中的 main 函数是入口地址
parse_args(argc, argv)
parse_args(argc, argv)
From man 3 getopt_long:
#include <unistd.h> int getopt(int argc, char * const argv[], const char *optstring); extern char *optarg; extern int optind, opterr, optopt;
Notice the optarg
char here.
#include <getopt.h> int getopt_long(int argc, char * const argv[], const char *optstring, const struct option *longopts, int *longindex);
The
getopt
()
function parses the command-line arguments. Its arguments argc and argv are the argument count and array as passed to the main() function on program invocation. An element of argv that starts with '-' (and is not exactly "-" or "--") is an option element. The characters of this element (aside from the initial '-') are option characters. Ifgetopt
()
is called repeatedly, it returns successively each of the option characters from each of the option elements.
That why we see a while-loop in parse_rags()
, it parses each option one by one.
The variable optind is the index of the next element to be processed in argv. The system initializes this value to 1. The caller can reset it to 1 to restart scanning of the same argv, or when scanning a new argument vector.
If
getopt
()
finds another option character, it returns that character, updating the external variable optind and a static variable nextchar so that the next call togetopt
()
can resume the scan with the following option character or argv-element.
Here man says it returns the character, but why the returned type is int
instead of char
? Is there any benefit?
If there are no more option characters,
getopt
()
returns -1. Then optind is the index in argv of the first argv-element that is not an option.
So it use int for convenience to express no more option characters while (unsigned) char
ranging from 0 to 255. That's why the while condition:
while ( (o = getopt_long(argc, argv, "-bhl:d:p:", table, NULL)) != -1) {
optstring is a string containing the legitimate option characters. If such a character is followed by a colon, the option requires an argument,
So the option in table
with required_argument
followed by a colon :
in the invocation above
const struct option table[] = {
{"batch" , no_argument , NULL, 'b'},
{"log" , required_argument, NULL, 'l'},
{"diff" , required_argument, NULL, 'd'},
{"port" , required_argument, NULL, 'p'},
{"help" , no_argument , NULL, 'h'},
{0 , 0 , NULL, 0 },
};
so
getopt
()
places a pointer to the following text in the same argv-element, or the text of the following argv-element, in optarg.
So here use optarg
to get the value of variables:
case 'p': sscanf(optarg, "%d", &difftest_port); break;
case 'l': log_file = optarg; break;
case 'd': diff_so_file = optarg; break;
case 1: img_file = optarg; return 0;
The
getopt_long
()
function works likegetopt
()
except that it also accepts long options, started with two dashes. (If the program accepts only long options, then optstring should be specified as an empty string (""), not NULL.)
The following struct option
is just the type of element in table
above:
longopts is a pointer to the first element of an array of struct
option
declared in <getopt.h> asstruct option { const char *name; int has_arg; int *flag; int val; };
The meanings of the different fields are:
name: the name of the long option.
has_arg:
no_argument (or 0) if the option does not take an argument;
required_argument (or 1) if the option requires an argument;
optional_argument (or 2) if the option takes an optional argument.
flag: specifies how results are returned for a long option.
If flag is NULL, then
getopt_long
()
returns val. (For example, the calling program may set val to the equivalent short option character.)Otherwise,
getopt_long
()
returns 0, and flag:
points to a variable which is set to val if the option is found
left unchanged if the option is not found
val: is the value to return, or to load into the variable pointed to by flag.
init_rand()
init_rand()
In src/utils/rand.c:
void init_rand() {
srand(MUXDEF(CONFIG_TARGET_AM, 0, time(0)));
}
In include/macro.h:
// macro concatenation
#define concat_temp(x, y) x ## y
#define concat(x, y) concat_temp(x, y)
// macro testing
// See https://stackoverflow.com/questions/26099745/test-if-preprocessor-symbol-is-defined-inside-macro
#define CHOOSE2nd(a, b, ...) b
#define MUX_WITH_COMMA(contain_comma, a, b) CHOOSE2nd(contain_comma a, b)
#define MUX_MACRO_PROPERTY(p, macro, a, b) MUX_WITH_COMMA(concat(p, macro), a, b)
// define placeholders for some property
#define __P_DEF_0 X,
#define __P_DEF_1 X,
#define __P_ONE_1 X,
#define __P_ZERO_0 X,
// define some selection functions based on the properties of BOOLEAN macro
#define MUXDEF(macro, X, Y) MUX_MACRO_PROPERTY(__P_DEF_, macro, X, Y)
Notice the comma in macro __P_DEF_x
here
Expand the macro in init_rand()
to:
CHOOSE2nd(__P_DEF_ ## CONFIG_TARGET_AM 0, time(0))
So it becomes:
CHOOSE2nd(__P_DEF_CONFIG_TARGET_AM 0, time(0))
From @a3f 's answer to Test if preprocessor symbol is defined inside macro:
Linux'
kgconfig.h
defines an__is_defined
macro for this use case:#define __ARG_PLACEHOLDER_1 0, #define __take_second_arg(__ignored, val, ...) val /* * Helper macros to use CONFIG_ options in C/CPP expressions. Note that * these only work with boolean and tristate options. */ /* * Getting something that works in C and CPP for an arg that may or may * not be defined is tricky. Here, if we have "#define CONFIG_BOOGER 1" * we match on the placeholder define, insert the "0," for arg1 and generate * the triplet (0, 1, 0). Then the last step cherry picks the 2nd arg (a one). * When CONFIG_BOOGER is not defined, we generate a (... 1, 0) pair, and when * the last step cherry picks the 2nd arg, we get a zero. */ #define __is_defined(x) ___is_defined(x) #define ___is_defined(val) ____is_defined(__ARG_PLACEHOLDER_##val) #define ____is_defined(arg1_or_junk) __take_second_arg(arg1_or_junk 1, 0)
It's C99 and works for tristate options (undefined, defined to 0, defined to 1).
From man 2 time:
#include <time.h> time_t time(time_t *tloc);
time()
returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC).If tloc is non-NULL, the return value is also stored in the memory pointed to by tloc.
init_log(log_file)
init_log(log_file)
In src/utils/log.c:
void init_log(const char *log_file) {
log_fp = stdout;
if (log_file != NULL) {
FILE *fp = fopen(log_file, "w");
Assert(fp, "Can not open '%s'", log_file);
log_fp = fp;
}
Log("Log is written to %s", log_file ? log_file : "stdout");
}
Try to open the log file named log_file as writable, if cannot open it, log to stdout.
In include/debug.h:
#define Log(format, ...) \
_Log(ANSI_FMT("[%s:%d %s] " format, ANSI_FG_BLUE) "\n", \
__FILE__, __LINE__, __func__, ## __VA_ARGS__)
In include/utils.h:
#define ANSI_NONE "\33[0m"
#define ANSI_FMT(str, fmt) fmt str ANSI_NONE
#define log_write(...) IFDEF(CONFIG_TARGET_NATIVE_ELF, \
do { \
extern FILE* log_fp; \
extern bool log_enable(); \
if (log_enable()) { \
fprintf(log_fp, __VA_ARGS__); \
fflush(log_fp); \
} \
} while (0) \
)
#define _Log(...) \
do { \
printf(__VA_ARGS__); \
log_write(__VA_ARGS__); \
} while (0)
init_mem()
init_mem()
In src/memory/paddr.c:
void init_mem() {
#if defined(CONFIG_PMEM_MALLOC)
pmem = malloc(CONFIG_MSIZE);
assert(pmem);
#endif
#ifdef CONFIG_MEM_RANDOM
uint32_t *p = (uint32_t *)pmem;
int i;
for (i = 0; i < (int) (CONFIG_MSIZE / sizeof(p[0])); i ++) {
p[i] = rand();
}
#endif
Log("physical memory area [" FMT_PADDR ", " FMT_PADDR "]", PMEM_LEFT, PMEM_RIGHT);
}
If
CONFIG_PMEM_MALLOC
is defined, allocate CONFIG_MSIZE spacesIf
CONFIG_MEM_RANDOM
is defined, randomize the data in memory just allocated in the unit ofuint32
size
三个对调试有用的宏
在 nemu/include/debug.h 中定义
#define Log(format, ...) \
_Log(ANSI_FMT("[%s:%d %s] " format, ANSI_FG_BLUE) "\n", \
__FILE__, __LINE__, __func__, ## __VA_ARGS__)
#define Assert(cond, format, ...) \
do { \
if (!(cond)) { \
MUXDEF(CONFIG_TARGET_AM, printf(ANSI_FMT(format, ANSI_FG_RED) "\n", ## __VA_ARGS__), \
(fflush(stdout), fprintf(stderr, ANSI_FMT(format, ANSI_FG_RED) "\n", ## __VA_ARGS__))); \
IFNDEF(CONFIG_TARGET_AM, extern FILE* log_fp; fflush(log_fp)); \
extern void assert_fail_msg(); \
assert_fail_msg(); \
assert(cond); \
} \
} while (0)
#define panic(format, ...) Assert(0, format, ## __VA_ARGS__)
Log()
Log()
In include/utils.h:
#define ANSI_NONE "\33[0m"
#define ANSI_FMT(str, fmt) fmt str ANSI_NONE
#define log_write(...) IFDEF(CONFIG_TARGET_NATIVE_ELF, \
do { \
extern FILE* log_fp; \
extern bool log_enable(); \
if (log_enable()) { \
fprintf(log_fp, __VA_ARGS__); \
fflush(log_fp); \
} \
} while (0) \
)
#define _Log(...) \
do { \
printf(__VA_ARGS__); \
log_write(__VA_ARGS__); \
} while (0)
Log()
会输出其被调用时所在的源文件, 行号和函数(注意其中的 "[%s:%d %s] "
格式化字符串和 __FILE__
、__LINE__
、__func__
参数为 C 的预定义宏)
#define Log(format, ...) \
_Log(ANSI_FMT("[%s:%d %s] " format, ANSI_FG_BLUE) "\n", \
__FILE__, __LINE__, __func__, ## __VA_ARGS__)
Assert()
Assert()
当测试条件 cond
为假时, 在 assertion fail 之前输出一些信息
panic()
panic()
以 cond = 0
“调用” Assert()
宏,输出信息并结束程序, 相当于无条件的 assertion fail
monitor 负责将客户程序读入到客户计算机中
NEMU 在开始运行的时候, 首先会调用
init_monitor()
函数 (在nemu/src/monitor/monitor.c
中定义) 来进行一些和 monitor 相关的初始化工作.接下来 monitor 会调用
init_isa()
函数(在nemu/src/isa/$ISA/init.c
中定义), 来进行一些ISA相关的初始化工作.第一项工作就是将一个内置的客户程序读入到内存中
init_isa()
的第二项任务是初始化寄存器, 这是通过restart()
函数来实现的
NEMU 返回到
init_monitor()
函数中, 继续调用load_img()
函数 (在nemu/src/monitor/monitor.c
中定义). 这个函数会将一个有意义的客户程序从镜像文件读入到内存, 覆盖刚才的内置客户程序. 这个镜像文件是运行 NEMU 的一个可选参数, 在运行 NEMU 的命令中指定. 如果运行 NEMU 的时候没有给出这个参数, NEMU 将会运行内置客户程序....
最后 monitor 会调用
welcome()
函数输出欢迎信息Monitor 的初始化工作结束后,
main()
函数会继续调用engine_start()
函数 (在nemu/src/engine/interpreter/init.c
中定义). 代码会进入简易调试器(Simple Debugger)的主循环sdb_mainloop()
(在nemu/src/monitor/sdb/sdb.c
中定义), 并输出NEMU的命令提示符. 简易调试器是 monitor 的核心功能在命令提示符后键入
c
后, NEMU开始进入指令执行的主循环cpu_exec()
(在nemu/src/cpu/cpu-exec.c
中定义).cpu_exec()
又会调用execute()
, 后者模拟了CPU的工作方式: 不断执行指令. 具体地, 代码将在一个for循环中不断调用exec_once()
函数, 这个函数的功能就是我们在上一小节中介绍的内容: 让CPU执行当前PC指向的一条指令, 然后更新PC.
NEMU 将不断执行指令, 直到遇到以下情况之一, 才会退出指令执行的循环:
达到要求的循环次数.
客户程序执行了
nemu_trap
指令. 这是一条虚构的特殊指令, 它是为了在NEMU中让客户程序指示执行的结束而加入的, NEMU在ISA手册中选择了一些用于调试的指令, 并将nemu_trap
的特殊含义赋予它们. 例如在riscv32的手册中, NEMU选择了ebreak
指令来充当nemu_trap
. 为了表示客户程序是否成功结束,nemu_trap
指令还会接收一个表示结束状态的参数. 当客户程序执行了这条指令之后, NEMU将会根据这个结束状态参数来设置NEMU的结束状态, 并根据不同的状态输出不同的结束信息, 主要包括HIT GOOD TRAP
- 客户程序正确地结束执行HIT BAD TRAP
- 客户程序错误地结束执行ABORT
- 客户程序意外终止, 并未结束执行
sdb 中输入 c 后调用的函数:
static int cmd_c(char *args) {
cpu_exec(-1);
return 0;
}
/* Simulate how the CPU works. */
void cpu_exec(uint64_t n) {
// some code ...
execute(n);
// some code ...
}
}
static void execute(uint64_t n) {
Decode s;
for (;n > 0; n --) {
exec_once(&s, cpu.pc);
g_nr_guest_inst ++;
trace_and_difftest(&s, cpu.pc);
if (nemu_state.state != NEMU_RUNNING) break;
IFDEF(CONFIG_DEVICE, device_update());
}
}
static void exec_once(Decode *s, vaddr_t pc) {
s->pc = pc;
s->snpc = pc;
isa_exec_once(s);
cpu.pc = s->dnpc;
#ifdef CONFIG_ITRACE
char *p = s->logbuf;
p += snprintf(p, sizeof(s->logbuf), FMT_WORD ":", s->pc);
int ilen = s->snpc - s->pc;
int i;
uint8_t *inst = (uint8_t *)&s->isa.inst.val;
for (i = ilen - 1; i >= 0; i --) {
p += snprintf(p, 4, " %02x", inst[i]);
}
int ilen_max = MUXDEF(CONFIG_ISA_x86, 8, 4);
int space_len = ilen_max - ilen;
if (space_len < 0) space_len = 0;
space_len = space_len * 3 + 1;
memset(p, ' ', space_len);
p += space_len;
void disassemble(char *str, int size, uint64_t pc, uint8_t *code, int nbyte);
disassemble(p, s->logbuf + sizeof(s->logbuf) - p,
MUXDEF(CONFIG_ISA_x86, s->snpc, s->pc), (uint8_t *)&s->isa.inst.val, ilen);
#endif
}
蓝框思考题
究竟要执行多久?
在
cmd_c()
函数中, 调用cpu_exec()
的时候传入了参数-1
, 你知道这是什么意思吗?
static void execute(uint64_t n) {
// ...
for (;n > 0; n --) {
// ...
}
}
cpu_exec()
将这个 -1 传递给 execute()
函数,而它们接收的参数都是无符号类型 uint64_t
,会强制将 -1 转换成最大的无符号数,而 execute()
函数从这个最大的无符号数开始循环直到 0,执行指令
潜在的威胁 (建议二周目思考)
"调用
cpu_exec()
的时候传入了参数-1
", 这一做法属于未定义行为吗? 请查阅C99手册确认你的想法.
Last updated