BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts Megan Cartwright on Building a Machine Learning MVP at an Early Stage Startup

Megan Cartwright on Building a Machine Learning MVP at an Early Stage Startup

Bookmarks

Today on the InfoQ Podcast, Wes speaks with ThirdLove’s Megan Cartwright. Megan is the Director of Data Science for the personalized bra company. In the podcast, Megan first discusses why their customers need a more personal experience for purchasing bras and how ThirdLove is using technology to help. She focuses quite a bit of time in the podcast discussing how they got to a machine learning MVP for recommendations. Along the way, she discusses decisions they made on what data to use, how to get the solution into production, how to update/train new models, and where they needed help. It’s a real story of an early stage startup using machine learning.

Key Takeaways

  • The experience of selecting bras is often characterized by awkward fitting experiences and an often uncomfortable product that may not even fit correctly. ThirdLove is a company built to serve this market.
  • ThirdLove took a lean approach to develop their architecture. It’s built with the Parse backend. The leveraged Shopify to build the site. The company’s first recommender system used a rules engine embedded into the front end. After that, they moved to a machine learning MVP with a Python recommender service that used a Random Forest algorithm in SciKit-Learn.
  • Despite having the data for 10 million surveys, the first algorithms only need about 100K records to be trained. The takeaway is you don’t have to have huge amounts of data to get started with machine learning.
  • To initially deploy their ML solution, ThirdLove first shadowed all traffic through the algorithm and then compared it to what was being output by the rules engine. Using this along with information on the full customer order lifecycle, they validated the ML solution worked correctly and outperformed the rules engine.
  • ThirdLove’s machine learning story shows that you move towards a machine learning solution quickly by leveraging your own network and using tools that may already familiar to your team.

Show Notes

Why is the bra market poised for disruption?

  • 02:10 I’ll start at the beginning – it’s a common story for women of my age (around 30).
  • 02:20 One of our founders (Heidi) needed a bra for a party she needed to go to.
  • 02:30 She ran over to Victoria’s Secret, got an ill-fitting bra (which was really expensive).
  • 02:35 She was so embarrassed that she hid the bag in her backpack and ran back to the campus.
  • 02:50 The bra didn’t fit, wasn’t comfortable – and she decided that women needed a better option.
  • 02:55 We’re stuck with two options - legacy carriers like Victoria’s Secret, where the have a limited number of bras.
  • 03:10 The other companies, like the fast fashion stuff like Zarra or H&M, and you never know what’s in stock or what it looks like.
  • 03:20 A woman wears a bra every day for years, and so our mission at third love is to make women feel confident in their everyday lives.

Where did you start?

  • 03:55 We started wanting to be inclusive for all women.
  • 04:00 Currently sizing is done by going into a store and having a stranger giving you a size, and then you wearing that size.
  • 04:10 We wanted to help women not having to go through that process – big brand stores like Nordstrom are not in every market.
  • 04:25 We thought we’d do a mobile app - that’s how we started.
  • 04:35 The idea was that you’d take a picture - it would be very modern, you’d get a size based on the picture.
  • 04:40 What we found was that it worked, but it wasn’t as easy as we wanted it to be - so we wondered how we could make it even easier.
  • 04:45 Not every woman has an iPhone, so it’s not as inclusive as it could be.
  • 04:55 We wanted to ditch that effort, move to Shopify, and create a website.
  • 05:05 We created a quiz, and based on the answers would give you a recommendation of your size.
  • 05:10 We felt so good about it, and we decided to start a programme called try before you buy.
  • 05:20 You can try the bra on, and if it didn’t fit, you could send it back without being charged.
  • 05:25 Once we did that, it just took off.
  • 05:30 We still have the try before you buy programme, but we’re looking at other programmes which may be more beneficial.
  • 05:40 We just launched a new initiative, called backup size, where you can get two sizes.
  • 05:55 That was due in part to the machine learning that we added to the product.
  • 06:00 What we found was that women could fit multiple sizes, potentially.
  • 06:10 Women’s bodies change over the course of a month – sometimes they get larger, sometimes smaller.
  • 06:15 It can be a little bit tough to distinguish what size you are on that day and what you might be going forward.
  • 06:25 While we know you are going to fit a specific size, there’s some women just don’t want to wear that size – they don’t feel comfortable with the band being as tight as it needs to be to support.
  • 06:45 So why not give them a band a little bit looser?
  • 06:50 It’s a smaller a percentage of our customers, but we want to make sure that every customer is happy.

So they may not accept the size, so the backup gave them the ability to try both?

  • 07:10 It’s very confusing if you’ve been wearing one size for ten years, and now you’re recommended to wear a different size.
  • 07:15 We recommend sizes and styles - some styles you may feel comfortable with a different size.

How do you recommend a bra size without measuring?

  • 07:45 We started the quiz - developed by Heidi and Rael - based on Rael’s experience of bra fitting over twenty years.
  • 08:00 She knew the questions and the pain points from the experience, and the quiz evolved from her knowledge.
  • 08:15 They knew they wanted to use machine learning from these questions, but that they also needed data.
  • 08:20 This is the special story of third love – they saved the data in the back-end for six months.
  • 08:40 They hired a head of data engineering, who put it in a data warehouse, and then they hired me as a data scientist.
  • 08:55 We have ten million data points – can we use this for something?

What’s the technology stack look like?

  • 09:30 We’re on Shopify, but the quiz is a JavaScript React webapp, hosted on Heroku, and using the server to go back and forth between the ends.
  • 10:00 We added an endpoint to use for a machine learning API.

It seems like a lean approach.

  • 10:20 It’s a very lean approach - we wanted to develop the product and to be able to test different UI and UX, different wording of the questions.
  • 10:30 When we first developed the questions, they might have been too technical for some people.
  • 10:40 I didn’t know about different styles or sizing structure - I was a physicist, and while I’ve worn bras, I wouldn’t know a plunge was.
  • 10:50 Making those questions a bit more personable was something that the product and engineering team wanted to do, and so they spent most time working on that.
  • 11:55 Having a simple back end meant that we could iterate on that part.

What was it like at the start?

  • 11:10 It was a rules engine; on the client side, in the JavaScript application, there was a simple rules algorithm.

What was it like being the data scientist?

  • 11:45 At first, I was really excited - I had been told that there was all this data.
  • 11:50 Then I dug into the data warehouse - we had one data engineer at the time, and he worked hard to get the data into the warehouse, but it wasn’t in an easy to use format.
  • 12:05 We didn’t track every session of using fit finder; if you used it once, and then came back again later, we weren’t saving all the information.
  • 12:20 The very first thing I did was to save all the information, which filled up the data warehouse some more.
  • 12:35 The data was pretty structured, since the questions were pretty structured.
  • 12:40 That means that when we ask a question, like the current bra size, you can’t input some random size.
  • 12:55 We do have some unstructured questions, such as the existing brand.
  • 13:05 People would write in Victoria’s Secret, even though it was one of the drop-down choices available.
  • 13:10 There was some work to clean up the data, but I decided to be lean in my data science approach.
  • 13:20 The first approach was to find out if I could build an algorithm using the data we have.
  • 13:30 My plan was to see if it performs similar or better than the existing rules based engine, and if so, hire an engineer to build out the API surface.

How did you clean the data?

  • 13:50 I took all of the questions and answers that someone gave in a fit finder session, and then if they made an order (which was a subset), then I’d put that into its own table.
  • 14:05 I analysed the data - I was concerned about an MVP.
  • 14:10 Since we didn’t have any historical data on the number of times someone took fit finder, pre- or post- a purchase, I decided to only look at people who did fit finder once.
  • 14:25 I didn’t want to worry about cleaning up the features up too much because I wanted to know what was there.
  • 14:30 I did have to make a call on grouping features together, how they could be encoded, because you don’t want to blow up your dimensionality space in your algorithm.
  • 14:45 You don’t want to have 200 columns within your algorithm - it may not perform well without a lot of observations (rows, basically).
  • 15:00 I wasn’t concerned about knitting features together, because I decided to take a random forest algorithm approach, which out of the box is one of the easiest ways to create a recommendation.
  • 15:15 Surprisingly we found that the simple random forest with simple features worked pretty well.

How many points did your first algorithm use?

  • 15:35 I only used about 100k for learning, which was a bit embarrassing.
  • 15:40 My Physics PhD was off of 50 data points, so I knew I could create something without a huge amount of data.

How many surveys did you have?

  • 16:00 We had more than 9 million surveys, but with my assumptions which we’re peeling back, it was a really small sample of clean data.
  • 16:15 It was only a subset of all the sizes that we carry; I wasn’t concerned with edge cases or the cold start problem.
  • 16:25 We added 24 new sizes I launched the algorithm in the product.
  • 16:35 We still have a rules based algorithm for the new sizes that we’ve added, and we now have a data scientist who is looking at that.

What did you use to get the learning algorithm off the ground?

  • 17:20 It to be something like Python and Scikit-learn; I’m familiar with R as well.
  • 17:30 For this project, we had to get up and running, and we started with Python.
  • 17:40 So the team was familiar with it, and we wanted it to fit in our infrastructure.
  • 17:50 No-one had previously built an API as we didn’t have a backend team – I started asking my friends out to coffee.
  • 18:30 I thought it would involve some kind of Python REST-based API, maybe using Docker.
  • 18:40 I was having coffee with one of my friends, and he volunteered to code it for us.
  • 19:00 He was between startups doing contracting work, and we were grateful to help us build it out.
  • 19:25 We built everything in our first MVP from a single Docker image, same Python version, same Scikit-learn version.
  • 19:40 We exported it into Dynamo (which was the backend to call the classifier).

How long has third love been operating?

  • 20:00 We launched the finder quiz with the rules-based engine mid-2016.
  • 20:10 They hired me at the end of 2017, and they had done a lot of tweaking on the front end of the quiz for about six months.
  • 20:20 I had about eight months’ worth of data when I started, and I figured out the data in a useful place with a couple of months work.
  • 20:55 We started actual work in March, and it only took a month because we already had the algorithms ready to go.
  • 21:00 The API service was built in about a week, and then we deployed it to the back end in Heroku.

How did the algorithm get refined?

  • 21:35 We needed to validate it, using a shadow launch.
  • 21:45 We processed every request coming through being evaluated and fit by the algorithm itself, and through a firehose into our S3 redshift cluster.
  • 21:55 We ran that for a week or so, and validated that everything was working right - we caught a few edge cases that way.

So you were shadowing the traffic using the rules engine?

  • 22:20 We wanted to make sure that we could validate it before going live.
  • 22:25 Once we had validated it, we used A/B testing on 10% of traffic, and we were using a company called BWO for our A/B testing solution.
  • 22:45 I wanted to track all the touchpoints of a customer within a test, so we had to build some new infrastructure for the clickstream data.
  • 23:00 We turned on a BWO test, where BWO partitions the requests for us to the machine learning or rules backend.
  • 23:20 Now we’re defaulting to using the new algorithm.

How do you retrain the algorithm?

  • 23:30 We upped it from 10% to 50% to 75% saying that the machine learning algorithm was a win.
  • 23:50 At that point we needed to build a proper team - in a startup, that means one person.
  • 23:55 Our engineering team is in Cordova, Argentina.
  • 24:00 One of them had particular experience with back-end solutions and he reached out to me.
  • 24:15 We started working together, and deployed algorithm v2.
  • 24:25 It was predicting more sizes than before, and he took everything we had done and said we needed to think about continuous deployment and retrain the algorithm.
  • 25:10 So we decided to go to CircleCI continuous integration - we are now pretty automated.
  • 25:25 The only non-automated part is when we actually make the classifier and the transform function, we have to send that as a pull request.

How do you evaluate whether it is more successful than a previous model?

  • 25:50 We are just finishing this week on that process.
  • 25:55 We decided to re-architect the back end.
  • 26:00 We have a team of 3 in Argentina so we had to train them so we could start working again.
  • 26:10 It was mostly how you redeploy the API and infrastructure within CircleCI; not end-to-end, because we’re not there yet.
  • 26:20 We’re now looking how to automatically retrain the model each month, because our newer feedback data is better than our older feedback data.
  • 26:30 We are still a new company, and our product is amazing, but we’re in the final stage of onboarding new manufacturing companies.
  • 26:40 We don’t give any product to anybody that is bad.

Are you using machine learning for recommendation only, or other parts of the business?

  • 27:00 We are using machine learning across the business.
  • 27:05 This was the first machine learning process about our product, but we’ve got some others for product returns and exchanges.
  • 27:20 We know that if you’ve bought something and you didn’t like it, we should make sure we get something you like.
  • 27:30 Machine learning and data science runs almost across every business function at this point.
  • 27:40 We’re using it in marketing, all of the acquisition and retention levers, financial planning, inventory forecasting.
  • 27:55 I feel that our data science (analytics, algorithms etc) is to inform and give better information to our business partners.

Is everything using the same technology?

  • 28:20 Some are, but not everything - I’ve hired a team of 4 over the last 6 months and let them use whatever they want to get something up and running.
  • 28:40 We are starting to backtrack a bit on that now - we want to all use the same best practice git.
  • 28:55 We’ve built a different media mixed model, which allows us to understand our different marketing channels and what spending we should do.
  • 29:10 That’s hosted on Heroku as well.
  • 29:20 We’re migrating over to AWS.
  • 29:30 We’ve hired a VP of engineering, and I’m going to report into him - he loves machine learning, serverless architecture, thinking how to scale efficiently.
  • 29:45 The first part is to take all fit finder behind an API and host that on AWS.
  • 30:05 We’re evaluating serverless at the moment - is it efficient, easy to use, reliable?
  • 30:15 We’re still a lean team, and if it allows us to move quickly then we will probably use it.

What has been your big takeaway and lessons learned?

  • 30:50 I started here to build an algorithm in product - ideally in real time, from scratch.
  • 31:15 I was pleased with the fact you can ask questions from your network and get a lot accomplished very quickly.
  • 31:25 You don’t have to be fancy or use deep learning, but you can take a lean approach to machine learning.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT