Systems

2019 Master

VR System

Recent studies have shown high usability of Virtual Reality (VR) for educational purposes, among which are applications that serve as a tutorial for learning something new. This study developed AutowareTRY, an in-car VR application that explores and helps people understand the concepts of autonomous driving. This app includes a 360-degree tour, and a gaming experience aided with navigation by an Embodied Conversational Agent(ECA). We evaluated our system in a real-world in-car VR set up by comparing the user experience, workload, and learning effect against a non in-car version of the same content. Statistical tests revealed that overall the users preferred the in-car version better.

 



   

 

2018 Master

Data Structures for Subset Undo Operations

”Undo” is a common operation not only for painting tools and text editors, but also for those dealing with uncertain situations, such as exceptions in programming languages and rollbacks in databases. In these applications and various extensions to them, all states are managed and linked to a large one clock (denoted as ”absolute time” model). However, since this ”absolute time” model has only one clock, ”undo” operations also rewind unnecessary states. This thesis presents a state management model, in which ”each variable has its own unique clock” and data structures for it. This model treats not only clocks for variables but also those for users who use software explicitly, thereby it makes possible to express undo applications that are difficult to be expressed by existing methods.



 
IPC-aware CPU Resource Management for Many-core Processors

Many-core SoCs, which contain energy-efficient cores, are expected to be common in both edge devices and cloud servers. This is because such architectures can reducepower consumption of overall systems, but state-of-the-art system software design is not straightforward to maximize the performance of applications. Interruption of running applications in the operating system degrades their IPC (instruction per cycle) and highly affects their throughput when they are running on an energy-efficient processor core, which has limited cache memory and the lower maximum possible IPC. Previous work has studied mitigating the overhead of system calls, but the performance optimization of offloading and batching system calls is an open problem. This thesis introduces a CPU resource management which minimizes IPC degradation of applications with maintaining throughput performance on many-core SoCs. This CPU resource management, named the one-core one-task, enables to each thread occupying the processor core to mitigate the overhead of context switch and offloading selected cache pollutive system calls to another core, which may cause high IPC degradation. The author also presents ScalableSC: the scalable communication algorithm with effective resource utilization which is essential to this model.

 



2019 Bachelor

Operating System for Heterogeneous Multi-core Processors

It is an important challenge in edge computing to improve the responsiveness of aperiodic tasks while satisfying real-time constraints. To achieve this goal, a multi-kernel design that reduces overhead by allocating CPU and memory exclusively for an aperiodic task was proposed. However, in the previous re-search, the types of programs that can be executed were limited and therefore evaluation was also limited to some simple situations. In this study, we have improved core isolation and memory allocation of the previous research to enable a broader range of programs to be executed. In addition, we have executed more general benchmarks using the improvement and performed a detailed evaluation of the previous research.



Simulation of DAG-Based Scheduling on Multi-core Processors

Schedulability analysis of a real-time system is of major concern and significance. While various studies have already been conducted on a uniprocessor system, scheduling DAG tasks on multicore processors, in which each task requires their execution in pre-defined orders, remains for further investigation. One way to test the schedulability of DAG task systems is to simulate the algorithms on software because when running the algorithms and verifying the schedulability on real hardware, it is mostly inevitable that we suffer from external noises such as interrupts and exceptions raised by processors, and end up in ruining the reliability of the results. In this thesis, we have implemented a simulator, which is immune to these noises, in order to test the schedulability of DAG task systems. Subsequently, we conducted experiments to show the actual results of a simulation of random DAG task systems.



 
A Scalable System Call Mechanism for Embedded Many-Core SoCs

In many-core SoCs, application throughput is highly affected by the overhead of context switch caused by an operating system, because each core has limited cache memory and low maximum IPC (instruction per cycle). To reduce this overhead of system calls, a CPU resource management that each thread exclusively uses single processor core and offloads high IPC degradation tasks to another core is proposed with the communication algorithm called ScalableSC which implements this model. This approach has a compatibility with various architectures because it does not require Linux kernel code modification. In the previous research, the verification of ScalableSC has implemented in a x64 machine. In this thesis, we have implemented a verification on an ARM machine and used a Linux application Memcached as an example to show the improvement of the application throughput and ensure the effectiveness and compatibility of ScalableSC.



 

2018 Bachelor

Shared Runtime for Memory-effective Unikernels

Library operating systems, a.k.a., Library OSes, are becoming more familiar with cloud-oriented single-purpose applications. Especially unikernels, compiled statistically from an application and functionality provided by Library OSes and deployed on Virtual Machine,achieves minimalism of the software stack and isolation of the application tasks at same time. An existing design approach to the unikernels, however, uses identical memory space for each instance of the unikernels even though the common runtime and functionality are provided,resulting in wasted memory resources. In this thesis, a new design approach to Library OSes is presented, which provides a mechanism to share overlapped resources among the unikernels. Our preliminary evaluation results show that memory resources reduction by presented approach.

 



   
Cache-aware Splitkernels

Distributed operating systems often aim at overcoming data center problems, such as hardware maintenance and component management. LegoOS is a relevant distributed operating system that focuses on modularity of hardware and components based on the concept of ”splitkernel”. LegoOS categorizes hardware into three components: pComponent (processor), mComponent (memory), and sComponent (storage). pComponent virtually provides ExCache as the last level cache (LLC) to accelerate data access. However, the performance of LegoOS is not as good as that of traditional operating systems, and one cause is probably the ExCache performance. Some applications of data centers have a lot of data that have the sequential locality, and the traditional simple prefetching algorithm is good for such data, so the author improved the ExCache performance to adapt the traditional prefetching algorithm to LegoOS and evaluate it in this study.

 



   
Fast Inter-Unikernel-as-Process Communication

In cloud computing environments, multiple tenants competing for shared resources need to be isolated, and virtual machines (VMs) have been often deployed to address this isolation problem. Unikernels are relevant variants of this solution, where particular single-purpose applications are deployed on VMs. To improve the usability of the unikernel while keeping its isolation level, previous work presented an approach to ”Unikernel as Process”, providing the unikernels that are executed as processes. However, this design of Unikernel as Process suffers from a limited interface to external modules, and as a result, the performance is also sacrificed. An example of the performance loss arises from a lack of fast communication methods across the unikernels. In this thesis, a new approach to Inter-Unikernel-as-Process Communication (IUPC) is presented, which enables shared memory across the unikernels without compromising the isolation level. The basic idea behind this approach is to have unikernel processes request the creation of shared memory object handle to a mediator process, receive the handle from the mediator, and then map the shared memory.

 



   
A Linear System Solver Based on 2D Array Processors for Scan Matching

Scan matching algorithms are increasingly used in many autonomous applications, such as mobile robots and autonomous vehicles. Given the workload of scan matching, hardware implementation of the algorithm logic is worth being considered. In that implementation, it is necessary to solve a system of linear equations to formulate data as a preprocessor to the scan matching algorithm. In this thesis, a highly parallel solution for solving a system of linear equations is presented in order to implement the whole scan matching algorithm by hardware. This solution is based on the Gaussian elimination method, and the computation corresponding to each element of the given matrix is dedicated to one processor element, to achieve extremely high parallelism of linear equations. A prototype system is implemented using an FPGA device, whereas a production system considers using an ASIC device.



 

2017 Bachelor

ROS Scheduler

Real-time systems, such as autonomous driving systems, require satisfying deadline and resolving dependency at the same time. Such scheduling requires creating a directed acyclic graph (DAG) of task dependency and solving the graph with priority in consideration. However, when executing a large and complex system with data- ow task sets, priority-based scheduling makes it difficult to fulfill deadlines of important tasks. The author proposes a job-grouped DAG task model in a real-time system, and its real-time scheduling framework. The proposed framework focuses on the deadlines of each group rather than those of each task, enabling it to meet the deadline. By modifying ROSCH, real-time scheduling framework for Robot Operating System, the author proves that the proposed group-optimized scheduling framework accomplishes to reduce the number of deadline miss.

 



 

 

Operating System for Heterogeneous Multi-core Processors

Single-ISA, heterogeneous multi-core architectures, which combine high-performance large cores and power-efficient small ones are attractive on systems with diverse workloads. Previous work has enhanced whole-system throughput on the architectures by improving thread scheduling algorithms. In this approach, however, overhead caused by scheduling and task migrations becomes problem. This paper presents an operating system (OS) design incorporating the multikernel model. A large core and other resources are isolated and then allocated exclusively to a specified process. Thus overhead caused by scheduling and task migrations, and mutual executions when accessing shared resources are avoided, leading to improvement in throughput. The author has implemented and evaluated the proposed OS design on real ARM big.LITTLE platforms to show the design is promising. The evaluation indicates improvement in throughput of processes which access shared resources frequently.

 



 

 

SLAM Accelerator Design

3D scan registration is an important method for localization on mobile devices. The 3D normal distribution transform (3D-NDT) is an efficient algorithm for 3D registration compared to the iterative closest point (ICP). Input point data are captured by 3D laser range finders, the resolutions of which are continuously getting higher. Localization for fast-moving objects such as automobile requires short turnaround time in the order of milliseconds. At the same time, embedded systems are sensitive to their power consumption. There exist CPU and GPU implementations of the algorithm, however, lack of flexibility makes them difficult to coordinate between calculation capability and usability in mobile system. To satisfy these requirements, a hardware implementation of the algorithm using FPGA with appropriate datatype is presented in this dissertation. The result shows the new option for 3D registration, which is also expected to be implemented to ASIC in future work.

 



 

 

2016 Bachelor

Operating System for Many-core Processors

We design and implement operating systems for many-core processors. In particular, we investigate the scalability and performance improvement in operating systems and device drivers on Intel Xeon Phi Knights Landing many-core processors.