2015年スタンフォード大学インターンシッププログラムレポート by Alan君

投稿者: | 2015年9月16日

高橋です。NaviPlusではここ数年スタンフォード大学からインターン生を受け入れており、昨年はRuben君に書いてもらった記事を紹介させていただきました。今年はAlan Salimov君に、今回のインターンシップについての記事を書いてもらいましたので紹介させていただきます。

Introduction

Hello, my name is Alan Salimov. I’m a junior at Stanford University majoring in Computer Science on the Systems track. Some of the most fascinating experiences I’ve had with computer science have come in the form of large, powerful systems and I’m interested in learning more how they work from the ground up. I am from Yorktown Virginia, USA and currently live in the Bay Area of California. I’ve taken Japanese for one year as of writing, and have been living here for study abroad and this internship for six months.

I’m a big fan of coffee (roasting it, tasting it, drinking it), reading, whisky, and touching up my .vimrc file when I should be working.

IMG_7247_2

Academics

As I’ve mentioned, I’m focused on systems and systems programming in my degree. That said, Stanford Computer Science does not limit its students to their specializations; as a result I’ve been exposed to peers and classes covering the breadth of computer science. One of my favorite classes was a web application class, where I learned how model-view-controller, authentication, and everything else comes together to make the internet as we know it happen. I also had my first experience with Ruby (on Rails) there, which has a surprising amount of similarities to Japanese.

I took Japanese as a gateway to study abroad and a part of the world I’ve never been to before. Through cultural classes on campus and abroad, I’ve also come to gain a unique perspective on foreign and Japanese business and culture, something valuable and that I will cherish as I go forwards, wherever that may be.

Internship

On my first day, I was laden with several backpacks and lost by 代官山駅. At our internship orientations the commandements were: do not be late, do not be on time, be early. It took a bit of thought (and some luck when my supervisor, 梅染さん, happened to run into my when I was lost inside of the building itself) but soon I was settling in.

From the get-go my future coworkers struck me as people who focused on creating a community. The open office, frequent conversations, and atmosphere all meant that problems were solved as a team. This means settling in also meant becoming part of the team, and from day 1 I felt included and welcome. Events, lunches, and my coworkers’ incredible patience with my downright bad Japanese made me feel like my work mattered and I mattered as well. Through daily email reports and weekly progress meetings, my coworkers kept me on the right path in my project.

Goals

In an extended meeting on the first day, my supervisors gave me a series of goals to reach. The overall scope of the project was to create a scalable recommendation system with plugginable algorithms on the Scala flavor of Apache Spark. In the months leading up to the internship, I had done research on how to implement a big system like this; it was daunting. Luckily, my task was broken up into several phases:

  1. Collaborative Filtering
  2. Content Based Recommendation
  3. REST API

The expectation was that these phases (and subgoals within) where levels, and the beauty of the internship was that I could strive to reach the highest level possible, and land wherever my abilities could stretch to. Compared to some of my peers, I always had a next step to take right up to the end, meaning I was never bored at my desk or not learning.

I called my system Pomegranate.

Software and Flows

I’d like to have a special section devoted to the software I used. While I had done research on using Spark and Scala, this did not quite prepare me for the challenges ahead. Spark is designed to work on large clusters, whereas I was testing on a single machine. This meant some Bad Times were ahead, and I spent a lot of time learning how exactly to tune my system to work where it needed to. My academic school work had more or less always skipped this part – starter code and a common cluster to ssh into meant that I focused on the code, not the settings that make the code work. I consider this tuning one of the most valuable things I learned in my time at NaviPlus.

I also learned a lot about my workstation and development. I’ve always been a fan of vim, but my coworkers’ extensive .vimrcs led me to think about how I use it, and with some fat dotfiles of my own now and tmux under belt I have them to thank for my new newfound speed. They also taught me the importance of TDD and how to actually use git. I went from sitting on one lame master branch to developing on a develop branch, working with feature branches, and releases. Not to mention readme and wiki conventions. Same with unit tests; while my tests still aren’t perfect, thinking about how testing and encapsulation can drive a project have made me a better programmer and thinker.

Last, physically setting everything up was a challenge. Rarely have I never needed to actually write the build file, figure out the file structure, and build the skeleton. A lot of assignments ask for just the meat, not the bones. Learning how the skeleton fits together (and actually fitting it together) was quite an experience.

Outcomes and Challenges

I finished the internship with a tuned collaborative filtering and a not-so-tuned content based recommendation system. I managed to hit level two, and that felt great. Part of the difficulty in this project came the single machine workstation, because taking the cartesian of 30000 rows each 10^10 long is not feasible on one mac. On the other hand, it’s also unrealistic even with the luxury of a distributed system; better algorithms, smarter code, and more forethought were the ideas that drove Pomegranate. If I could optimize, then the speedups from a single machine would also be expressed in a larger cluster.

Figuring out where the problems were on my two thread local mode also gave me a better understanding of Spark and map reduce systems in general. How the driver and executor interact, how persisting data on the executors can save time, and when that’s a bad idea. Even though tuning for one system is specific, the methods and thought patterns carry over so I am confident in working on a larger cluster now as well.

Evaluation was also a challenge. I do not have a background in stats, but mean average precisions and root mean square averages still had to be calculated. Not to mention cross validation, which was its own can of worms as it took hours to go through the many parameters that would determine what’s best for the data. The upshot is a nuanced understanding of what those statistical methods are about – even within the project, it was a lot easier to move from algorithm A to B then nothing to A.

presentation

Japan?

I went to Taipei with some friends before the internship started, and after three months Kyoto, it felt like I was coming right back home in Narita. This was not the case. After the <5 story buildings in Kyoto, casting off into the largest city in the world was somewhat daunting. I slept in a capsule hotel, and soon I was moving into my house. My apartment was a capsule on its own right. Conbini meals eclipsed regular meals after a little while.

All in all though, I did feel comfortable in the city. My terrible Japanese was enough to get around and not feel lost, and I found my way to events on my own. I really enjoyed going to a coffee tasting at a coffee shop I frequented and was chatting with the barista. Waiting twice (!) at 3:45am in line for sushi at 築地 was also worth it both times.

My coworkers made sure I had a great time whenever they could. We had daily lunches out in and around 恵比寿・代官山, so I tried everything from curry to burgers to sushi to roast beef rice bowl. All of it was absolutely delicious.

lunch

Some excursions that come to mind are when we all went to 川越 for a day of sightseeing and eating. As blisteringly hot as it was, it was a great time. 氷川神社 was beautiful, as was the eel. A few weeks later, we went to Alohidden, an Uzbek restaurant where I feasted on my native dishes. Making friends with the people who guide you in your work is something that led to my success at NaviPlus, and turned Japan from a foreign country to a familiar place.

kawagoe

OLYMPUS DIGITAL CAMERA

Conclusions

I had a great time with studying abroad and my internship. Despite the bugs and absurdity of the past six months, between Kyoto, Tokyo, and the rest of east Asia I managed to see I grew as a student, worker, and a person. The thing I cherish the most is knowing that it doesn’t take much more than an open mind and good people to adapt to a completely new place. I was lucky, of course; the Stanford center was a cultural liasion I could always depend on for advice. Compared to someone thrown in another place with no contacts whatsoever I had it easy. Still, I have the tools to adapt if I ever find myself abroad again. I think I will.

OLYMPUS DIGITAL CAMERA