什么是runC?

本文为看雪论坛优秀文章
看雪论坛作者ID:时钟
一
控制寄存器
OCI 标准
OCI bundle
runC框架

type BaseContainer interface {// Returns the ID of the containerID() string// Returns the current status of the container.Status() (Status, error)// State returns the current container's state information.State() (*State, error)// OCIState returns the current container's state information.OCIState() (*specs.State, error)// Returns the current config of the container.Config() configs.Config// Returns the PIDs inside this container. The PIDs are in the namespace of the calling process.//// Some of the returned PIDs may no longer refer to processes in the Container, unless// the Container state is PAUSED in which case every PID in the slice is valid.Processes() ([]int, error)// Returns statistics for the container.Stats() (*Stats, error)// Set resources of container as configured//// We can use this to change resources when containers are running.//Set(config configs.Config) error// Start a process inside the container. Returns error if process fails to// start. You can track process lifecycle with passed Process structure.Start(process *Process) (err error)// Run immediately starts the process inside the container. Returns error if process// fails to start. It does not block waiting for the exec fifo after start returns but// opens the fifo after start returns.Run(process *Process) (err error)// Destroys the container, if its in a valid state, after killing any// remaining running processes.//// Any event registrations are removed before the container is destroyed.// No error is returned if the container is already destroyed.//// Running containers must first be stopped using Signal(..).// Paused containers must first be resumed using Resume(..).Destroy() error// Signal sends the provided signal code to the container's initial process.//// If all is specified the signal is sent to all processes in the container// including the initial process.Signal(s os.Signal, all bool) error// Exec signals the container to exec the users process at the end of the init.Exec() error}
// Container is a libcontainer container object.//// Each container is thread-safe within the same process. Since a container can// be destroyed by a separate process, any function may return that the container// was not found.type Container interface {BaseContainer// Methods below here are platform specific// Checkpoint checkpoints the running container's state to disk using the criu(8) utility.Checkpoint(criuOpts *CriuOpts) error// Restore restores the checkpointed container to a running state using the criu(8) utility.Restore(process *Process, criuOpts *CriuOpts) error// If the Container state is RUNNING or CREATED, sets the Container state to PAUSING and pauses// the execution of any user processes. Asynchronously, when the container finished being paused the// state is changed to PAUSED.// If the Container state is PAUSED, do nothing.Pause() error// If the Container state is PAUSED, resumes the execution of any user processes in the// Container before setting the Container state to RUNNING.// If the Container state is RUNNING, do nothing.Resume() error// NotifyOOM returns a read-only channel signaling when the container receives an OOM notification.NotifyOOM() (<-chan struct{}, error)// NotifyMemoryPressure returns a read-only channel signaling when the container reaches a given pressure levelNotifyMemoryPressure(level PressureLevel) (<-chan struct{}, error)}
type Factory interface {// Creates a new container with the given id and starts the initial process inside it.// id must be a string containing only letters, digits and underscores and must contain// between 1 and 1024 characters, inclusive.//// The id must not already be in use by an existing container. Containers created using// a factory with the same path (and filesystem) must have distinct ids.//// Returns the new container with a running process.//// On error, any partially created container parts are cleaned up (the operation is atomic).Create(id string, config *configs.Config) (Container, error)// Load takes an ID for an existing container and returns the container information// from the state. This presents a read only view of the container.Load(id string) (Container, error)// StartInitialization is an internal API to libcontainer used during the reexec of the// container.StartInitialization() error// Type returns info string about factory type (e.g. lxc, libcontainer...)Type() string}
// LinuxFactory implements the default factory interface for linux based systems.type LinuxFactory struct {// Root directory for the factory to store state.Root string// InitPath is the path for calling the init responsibilities for spawning// a container.InitPath string// InitArgs are arguments for calling the init responsibilities for spawning// a container.InitArgs []string// CriuPath is the path to the criu binary used for checkpoint and restore of// containers.CriuPath string// New{u,g}idmapPath is the path to the binaries used for mapping with// rootless containers.NewuidmapPath stringNewgidmapPath string// Validator provides validation to container configurations.Validator validate.Validator// NewIntelRdtManager returns an initialized Intel RDT manager for a single container.NewIntelRdtManager func(config *configs.Config, id string, path string) intelrdt.Manager}
type linuxContainer struct {id stringroot stringconfig *configs.ConfigcgroupManager cgroups.ManagerintelRdtManager intelrdt.ManagerinitPath stringinitArgs []stringinitProcess parentProcessinitProcessStartTime uint64criuPath stringnewuidmapPath stringnewgidmapPath stringm sync.MutexcriuVersion intstate containerStatecreated time.Timefifo *os.File}
func createContainer(context *cli.Context, id string, spec *specs.Spec) (libcontainer.Container, error) {rootlessCg, err := shouldUseRootlessCgroupManager(context)if err != nil {return nil, err}config, err := specconv.CreateLibcontainerConfig(&specconv.CreateOpts{CgroupName: id,UseSystemdCgroup: context.GlobalBool("systemd-cgroup"),NoPivotRoot: context.Bool("no-pivot"),NoNewKeyring: context.Bool("no-new-keyring"),Spec: spec,RootlessEUID: os.Geteuid() != 0,RootlessCgroups: rootlessCg,})if err != nil {return nil, err}factory, err := loadFactory(context)if err != nil {return nil, err}return factory.Create(id, config)}
// New returns a linux based container factory based in the root directory and// configures the factory with the provided option funcs.func New(root string, options ...func(*LinuxFactory) error) (Factory, error) {if root != "" {if err := os.MkdirAll(root, 0o700); err != nil {return nil, err}}l := &LinuxFactory{Root: root,InitPath: "/proc/self/exe",InitArgs: []string{os.Args[0], "init"},Validator: validate.New(),CriuPath: "criu",}for _, opt := range options {if opt == nil {continue}if err := opt(l); err != nil {return nil, err}}return l, nil}
switch r.action {case CT_ACT_CREATE:err = r.container.Start(process)case CT_ACT_RESTORE:err = r.container.Restore(process, r.criuOpts)case CT_ACT_RUN:err = r.container.Run(process)default:panic("Unknown action")}
// Process specifies the configuration and IO for a process inside// a container.type Process struct {// The command to be run followed by any arguments.Args []string// Env specifies the environment variables for the process.Env []string// User will set the uid and gid of the executing process running inside the container// local to the container's user and group configuration.User string// AdditionalGroups specifies the gids that should be added to supplementary groups// in addition to those that the user belongs to.AdditionalGroups []string// Cwd will change the processes current working directory inside the container's rootfs.Cwd string// Stdin is a pointer to a reader which provides the standard input stream.Stdin io.Reader// Stdout is a pointer to a writer which receives the standard output stream.Stdout io.Writer// Stderr is a pointer to a writer which receives the standard error stream.Stderr io.Writer// ExtraFiles specifies additional open files to be inherited by the containerExtraFiles []*os.File// Initial sizings for the consoleConsoleWidth uint16ConsoleHeight uint16// Capabilities specify the capabilities to keep when executing the process inside the container// All capabilities not specified will be dropped from the processes capability maskCapabilities *configs.Capabilities// AppArmorProfile specifies the profile to apply to the process and is// changed at the time the process is execedAppArmorProfile string// Label specifies the label to apply to the process. It is commonly used by selinuxLabel string// NoNewPrivileges controls whether processes can gain additional privileges.NoNewPrivileges *bool// Rlimits specifies the resource limits, such as max open files, to set in the container// If Rlimits are not set, the container will inherit rlimits from the parent processRlimits []configs.Rlimit// ConsoleSocket provides the masterfd console.ConsoleSocket *os.File// Init specifies whether the process is the first process in the container.Init boolops processOperationsLogLevel string// SubCgroupPaths specifies sub-cgroups to run the process in.// Map keys are controller names, map values are paths (relative to// container's top-level cgroup).//// If empty, the default top-level container's cgroup is used.//// For cgroup v2, the only key allowed is "".SubCgroupPaths map[string]string}
func (c *linuxContainer) Start(process *Process) error {c.m.Lock()defer c.m.Unlock()if c.config.Cgroups.Resources.SkipDevices {return errors.New("can't start container with SkipDevices set")}if process.Init {if err := c.createExecFifo(); err != nil {return err}}if err := c.start(process); err != nil {if process.Init {c.deleteExecFifo()}return err}return nil}
func (c *linuxContainer) start(process *Process) (retErr error) {parent, err := c.newParentProcess(process)if err != nil {return fmt.Errorf("unable to create new parent process: %w", err)}logsDone := parent.forwardChildLogs()if logsDone != nil {defer func() {// Wait for log forwarder to finish. This depends on// runc init closing the _LIBCONTAINER_LOGPIPE log fd.err := <-logsDoneif err != nil && retErr == nil {retErr = fmt.Errorf("unable to forward init logs: %w", err)}}()}if err := parent.start(); err != nil {return fmt.Errorf("unable to start container process: %w", err)}if process.Init {c.fifo.Close()if c.config.Hooks != nil {s, err := c.currentOCIState()if err != nil {return err}if err := c.config.Hooks[configs.Poststart].RunHooks(s); err != nil {if err := ignoreTerminateErrors(parent.terminate()); err != nil {logrus.Warn(fmt.Errorf("error running poststart hook: %w", err))}return err}}}return nil}
type initProcess struct {cmd *exec.CmdmessageSockPair filePairlogFilePair filePairconfig *initConfigmanager cgroups.ManagerintelRdtManager intelrdt.Managercontainer *linuxContainerfds []stringprocess *ProcessbootstrapData io.ReadersharePidns bool}
type parentProcess interface {// pid returns the pid for the running process.pid() int// start starts the process execution.start() error// send a SIGKILL to the process and wait for the exit.terminate() error// wait waits on the process returning the process state.wait() (*os.ProcessState, error)// startTime returns the process start time.startTime() (uint64, error)signal(os.Signal) errorexternalDescriptors() []stringsetExternalDescriptors(fds []string)forwardChildLogs() chan error}
func (c *linuxContainer) newInitProcess(p *Process, cmd *exec.Cmd, messageSockPair, logFilePair filePair) (*initProcess, error) {cmd.Env = append(cmd.Env, "_LIBCONTAINER_INITTYPE="+string(initStandard))nsMaps := make(map[configs.NamespaceType]string)for _, ns := range c.config.Namespaces {if ns.Path != "" {nsMaps[ns.Type] = ns.Path}}_, sharePidns := nsMaps[configs.NEWPID]data, err := c.bootstrapData(c.config.Namespaces.CloneFlags(), nsMaps, initStandard)if err != nil {return nil, err}if c.shouldSendMountSources() {// Elements on this slice will be paired with mounts (see StartInitialization() and// prepareRootfs()). This slice MUST have the same size as c.config.Mounts.mountFds := make([]int, len(c.config.Mounts))for i, m := range c.config.Mounts {if !m.IsBind() {// Non bind-mounts do not use an fd.mountFds[i] = -1continue}// The fd passed here will not be used: nsexec.c will overwrite it with dup3(). We just need// to allocate a fd so that we know the number to pass in the environment variable. The fd// must not be closed before cmd.Start(), so we reuse messageSockPair.child because the// lifecycle of that fd is already taken care of.cmd.ExtraFiles = append(cmd.ExtraFiles, messageSockPair.child)mountFds[i] = stdioFdCount + len(cmd.ExtraFiles) - 1}mountFdsJson, err := json.Marshal(mountFds)if err != nil {return nil, fmt.Errorf("Error creating _LIBCONTAINER_MOUNT_FDS: %w", err)}cmd.Env = append(cmd.Env,"_LIBCONTAINER_MOUNT_FDS="+string(mountFdsJson),)}init := &initProcess{cmd: cmd,messageSockPair: messageSockPair,logFilePair: logFilePair,manager: c.cgroupManager,intelRdtManager: c.intelRdtManager,config: c.newInitConfig(p),container: c,process: p,bootstrapData: data,sharePidns: sharePidns,}c.initProcess = initreturn init, nil}
参考链接:
看雪ID:时钟
https://bbs.pediy.com/user-home-831025.htm

# 往期推荐


球分享

球点赞

球在看

点击“阅读原文”,了解更多!
[广告]赞助链接:
关注数据与安全,洞悉企业级服务市场:https://www.ijiandao.com/
让资讯触达的更精准有趣:https://www.0xu.cn/
关注KnowSafe微信公众号随时掌握互联网精彩
赞助链接



