如何利用碎片时间提升技术认知与能力? 点击获取答案


| 作者 肖雨浓 关注 0 他的粉丝 发布于 2018年6月21日. 估计阅读时间: 42 分钟 | ArchSummit北京2018 共同探讨机器学习、信息安全、微服务治理的关键点




Xiao:I currently lead the FaaS and API platform at Netflix. The Netflix API is a tier-1 service through which every single request from all Netflix clients flow through. It allows us to integrate the hundreds of microservices on the backend into one coherent service for clients to access. We're building a FaaS platform to enable engineers to quickly develop, test, and operate these API services -- which generally are bespoke to each device.

我目前在Netflix带领FaaS和API平台团队,Netflix API是一个tier-1服务,通过这个服务,来自Netflix所有客户的每一个单个需求都可以平滑经过。基于这个API服务,我们还可以将后端的上百个微服务整合进一个连贯的服务里,便与用户访问。我们当前正在构建一个FaaS平台来帮助工程师们快速开发,测试并维护这些API服务,通常情况下,这个平台会被定制到每一个设备里。


Xiao:At Netflix, we design our product with innovation in mind. What this means is that we're constantly A/B testing our product and launching many new features each week. In order enable this kind of velocity, we require a API services platform which enables client engineers to be able to rapidly deploy to production changes to their services. FaaS achieves this by abstracting away all of the platform components usually associated with a service down to just business logic itself -- allowing engineers to focus on developing great new features instead of writing boiler plate code.


Additionally, operating services at more than four 9s of availability is difficult -- even for seasoned server engineers. Thus a serverless model where we centralize the operations allows us provide a platform that allows even engineers without server and operational experience to develop highly available services.

此外,即使对于经验丰富的服务器工程师而言,运行服务的可用性超过四个9也是很困难的。 因此,我们集中操作的Serverless模式能够为我们提供一个平台,即使没有服务器和运营经验的工程师也可以开发高可用的服务。

InfoQ:能否进一步详细介绍API Platform的架构?目前API Platform是如何落地Serverless的?

Xiao:At a very high level, the API platform consists of a FaaS platform which allows engineers to deploy functions with customs business logic as highly available production services.



Xiao:There are tradeoffs to consider with serverless. By adopting the FaaS model, you are essentially trading customization for velocity and perhaps availbility. There are some applications where FaaS for services works really well -- as is the case for the Netflix API where we run relatively uniform microservices that only need to access and mutate data from downstream services. However, if a service requires customization, such as needing to change various parts of the service platform e.g. RPC, data access, caching, authentication, then the FaaS model may not provide enough flexibility for such services.

无服务器在实践场景里是可以考虑权衡点的。 通过采用FaaS模式,本质上是对交易速度和可能性的定制化。有些应用程序的FaaS服务表现得很好——Netflix API的情况就是如此,我们运行的是相对统一的微服务,只需要访问和改变下游服务的数据。 然而,如果服务需要定制化,例如需要改变服务平台的各个组成部分,像RPC,数据访问,缓存,认证等,那么FaaS模式可能无法为这些服务提供足够的灵活性。

Our focus currently is to finish migrating the legacy API services over to the new stack. After that our focus could include many areas such as performance -- both to reduce cost and improve customer experience -- and other areas such as infrastructure and platform improvements.



Xiao:Functions are deployed as isolated services -- which means we're not deploying functions from different services on the same instance. This is really important for us as we wouldn't want one misbehaving service to take down all of Netflix. This isolation helps us prevent large scale outages across all of Netflix. We also integrate against our internal metrics, alerting, and monitoring systems, which gives us visibility into the health of each service. The service platform contains modern load-shedding technologies such as concurrency limits and circuit breaking -- these generally help prevent large scale outages. We've also invested heavily in runtime debugging, profiling, and sampling which provides the observability we need to operate many services at scale. There are many other components in the platform that help us run reliably, come to the talk to find out more!

函数被部署为独立服务,这意味着我们不会在同一个实例上部署不同服务的函数。这对我们来说非常重要,因为我们不想让一个行为不良的服务拖累所有的Netflix服务。这种隔离有助于防止所有Netflix服务出现大规模停机。我们还会对内部指标、警报和监控系统进行整合,从而让我们了解每项服务的健康状况。该服务平台包含先进的削减负荷技术,如并发限制和断路,这些措施有助于防止大规模停机。我们还在运行时调试、分析和采样方面投入大量精力,这为我们提供了必须的可观察性,以便对服务进行大规模运维。该平台还有许多其他组件帮助我们更可靠地运行,来听我的演讲了解更多信息!《Going FaaSter: Function as a Service at Netflix

In terms of dependencies we allow users to import third party libraries at will -- but of course this means engineers need to exercise judgement with respect to things like security and performance.


InfoQ:如何决策或对比使用公有云 FaaS 服务或私有云自建 FaaS 服务?

Xiao:This comes down to the classic build vs buy question. I think one should be pragmatic when faced with this decision. When we were first designing our FaaS platform, we considered public options such as Lambda and App Engine. We would be happy to use off the shelf solutions if they fit our use case.

这归结为典型的“自建 or 购买”问题。我认为面对这个决定时应该务实。当我们首次设计FaaS平台时,我们考虑了诸如Lambda和App Engine等公共选项。如果符合我们的场景,我们当然很乐意使用现成的解决方案。

As it turns out, we needed a platform that integrated with the existing Netflix service platform components such as metrics, alerts, service discovery, and many others, and this integration with high level FaaS platforms would be difficult.


Additionally, we needed full visibility into the services using the FaaS platform. Building it ourselves meant that we have full control all the way down to the operating system -- and we can give operators (ourselves) the tools and visibility to debug the services and platform.


Obivously there's a huge amount of effort, time, and cost that went into building our own FaaS platform -- so we don't make these decisions lightly. However at the time we couldn't find an open source or public FaaS option that satisfied our requirements.


This doesn't mean others should follow in our footsteps. If there is an open source or public FaaS option that suits your requirements, then absolutely go and use it. Opportunity cost is also an important metric. Technology is just a means to an end -- and people should absolutely use the best tool for the job -- often this means buying and not building

这并不意味着大家都要模仿Netflix的脚步。如果符合需求的开源或公开FaaS选项存在,那么绝对要去使用。机会成本也是一个重要指标。技术只是达到目的的手段 - 我们当然应该使用最好的工具来完成这项工作,通常这意味着购买成熟的方案而不是自建。

InfoQ:对于 CI/CD 与 FaaS 的结合,有什么样比较好的建议?

Xiao:Providing a robust first class testing framework is important. We designed our FaaS platform with testing in mind. As a result, we created a testing framework with features such as first class mocks and tight integration with the developer tooling to make it very easy for engineers to write unit, integration and end to end tests using the FaaS platform.


One of the main advantages of the our test framework is that it allows them to test their functions in isolation, either locally or on jenkins -- without having to deploy code to the cloud. This ease of use inventivises our customers to write tests -- which helps us improve the reliability of the service.



Xiao:Today most Serverless solutions are geared towards batch and event driven tasks which are not latency sensitive. However, we believe serverless should also be considered for production services since they reduce operational and code complexity by abstracting away the platform and infrastructure.


For us, there was a clear need within the Netflix API organization for a FaaS model which supported service style workloads. We believe through converstaions with other companies that there is an appetite for service style FaaS platforms -- most services for teams are a means to an end -- they're not opionionated or care about how the service is implemented, only that it performs the business logic they need reliably with good developer ergonomics.

对于我们来说,Netflix API组织中有明确的需求,需要FaaS模式来支持服务型工作负载。我们相信通过与其他公司的交流,大家对服务型FaaS平台会有浓厚的兴趣,大多数团队服务都只是为达到目的一种手段,没人激励他们,也没人关心服务是如何部署的,只需要它们可靠的执行业务逻辑。

I think FaaS is a natural evolution, many years ago most services used bespoke software up and down the entire stack, running inside data centers owned by each company. We're moving towards a model today where we're commoditizing the components further and futher up the stack -- we started with the commoditizing of hardware and data centers with IaaS (think AWS EC2), and then moved towards commoditizing some parts of the platform with PaaS (think Heroku, or Google Cloud Platform), the natural evolution of this is toward FaaS where everything is provided by the platform except for the business logic which is the function itself.

我认为FaaS是一种自然演变,许多年前,大多数服务使用定制软件在整个堆栈中运行,并在每个公司内部数据中心运行。现在,我们正朝着一种模式迈进,在这个模型中,我们将组件进一步商品化,并进一步向前推进。我们开始使用IaaS商业化硬件和数据中心(例如AWS EC2),然后转向将平台与PaaS的某些部分商业化(例如Heroku或Google Cloud Platform)。这种自然演变促使FaaS出现,一切都由平台提供,而只有业务逻辑是函数本身的。


Xiao:One of the reasons we see so many FaaS platforms built on top of K8s is due to the fact that K8s abstracts away the infrastructure and platform required for building scalable and reliable services on top of containers. This is powerful as it means that FaaS frameworks can focus on the function runtime.


This space will continue to evolve and I hope to see additional FaaS frameworks emerge -- especially ones that can fulfill the need for service style workloads at scale (Think rich metrics, autoscaling, performance optimizations). I believe K8s will evolve in terms of its ability to run at larger scales -- this would make it an even better fit for use cases exceeding 5000 physical nodes.

这一块将继续演变,我希望看到更多的FaaS框架出现,尤其是能够满足大规模服务风格工作负载需求的那些(能够考虑到丰富的指标,自动调整,性能优化)。 我相信K8s将以更大规模运行的能力发展,这将使它更适合超过5000个物理节点的使用情况。


Xiao:Engineers should be pragmatic and look to make incremental changes to the architecture. Changing everything at once significantly increases the complexity, risk, and timeline of the project. Making incremental changes means we can shorten the feedback loop, realize gains more quickly for the business, and reduce the risk by changing only a few components at a time.


We should balance the tradeoffs of each decision and seek to get broad alignment within the company and mine for dissent. Be judicious when it comes to adopting new technology -- ask yourself the question, "why are you picking this technology?" If you can't answer it in a way that satisfies your team or organization -- then you should think twice. Think about the implications of adopting new technologies. Does it have a broad user and support base? Does it provide a good set of tooling to operate and debug? What about documentation? How about the maintenance cycle? What is the impact to the organization as a whole by adopting a new technology -- will platform teams now need to support this new technology across the entire organization?


For example, we adopted containers for the FaaS platform, for very specific reasons. It allowed us to enable engineers to run their services everywhere, and gave us immutable build artifacts. This decision didn't just impact our team -- as it required us to create a new team at Netflix which was tasked with building a container orchestration system. The decisions to use new technology can often have rippling and unforseen consequences up and down the entire company.


InfoQ:在 FaaS 服务的开发过程中,工程师最关注点的是什么?

Xiao:For the development experience, we focused on the ergonomics of our FaaS platform. This was the biggest feedback from engineers using the FaaS platform. As a result we focused on building developer tooling that allows engineers to develop and debug their functions locally on their dev machines -- including the ability to tail logs and attach debuggers.

对于开发体验,我们专注于FaaS平台的人体工程学。这是工程师使用FaaS平台的最大反馈。 因此,我们专注于构建开发者工具,使工程师能够在其开发机器上本地开发和调试其功能,包括尾部日志和附加调试程序的功能。


Xiao:Engineers should focus on the things that matter to their teams -- for most this no longer means the infrastructure or service platform. For our engineers who use the FaaS platform, this allows them to focus on product innovation -- improving the Netflix experience for our more than 125 million members.




肖雨浓目前是 Netflix 位于美国加利福尼亚州洛斯盖多斯(镇)的首席软件工程师,带领 Netflix API 平台设计和架构团队。在此前,他任职于 AWS 和 Joyent,主要方向是分布式系统,并帮助规划和构建了多款云计算产品,例如 AWS IAM 和 Manta。与此同时,他也在维护开源项目 Node.JS 框架的校正。Yunong 获得了滑铁卢大学计算机工程荣誉学位。




您需要 注册一个InfoQ账号 或者 才能进行评论。在您完成注册后还需要进行一些设置。



允许的HTML标签: a,b,br,blockquote,i,li,pre,u,ul,p


允许的HTML标签: a,b,br,blockquote,i,li,pre,u,ul,p


允许的HTML标签: a,b,br,blockquote,i,li,pre,u,ul,p