InfoQ Homepage Presentations Panel: Sequential Data

Panel: Sequential Data



Moderator: As you have maybe seen from the track, we go through a lot of different fields, we have natural language, we have time series data, there's music. Sequential data is a really broad field, that's where my first question comes into play. Do you see overlaps in the fields, in the approaches that are taken? Do you as well get inspired by other fields? Probably, the NLP people are way ahead of all the other ones.

Taylor: The thing I see in common behind all these approaches is that you're exploiting some structure that you know about the problem in advance. If you had to have a machine learning or a statistical model learn things from scratch, it would take it a really long time to do that. A lot of what I see happening is how can we, as humans, point these algorithms in the right direction by saying, "For Prophet it's like things that happen one second ago will be similar to things that happen in the next second or a week ago will be similar to ...?" That domain knowledge helps the model get a head start and puts it into a space where it's more likely to succeed at the problem. A lot of NLP methods exploit the same idea, that there's some structure to the way that language works that allows us to not have the model just have to learn things all the way from nothing.

Ameisen: Yes, that's really true. To add to that, that gets translated into the primitives that you use. One example of one detail would be if you look at the primitive of a convolution, like a CNN starts in images, then turns out that CNNs work pretty well for NLP as well. Then it turns out that you can do time series forecasting using CNNs as well, as we saw for Uber. It turns out that for audio, you can look at a spectrogram and use a CNN as well, if you want to. A lot of these fields borrow from each other very frequently, and that works really well in practice.

Grus: Yes, the thing that occurs to me is that when you work with some of these constructs within a field, say, RNNs or LSTMs for, say, NLP, then you start thinking about, "What are the common abstractions between these?" So rather than thinking about my problem in terms of LSTMs, I'm going to think about my problem in terms of something that takes a sequence and gives me back a sequence. Once you move to those higher abstractions, it's easier to see a way to apply them in other domains.

Choi: That was a very important component, the abstraction is the key. Many people have asked me something like "Because this is audio, it has to be LSTM, right?" It's not like that, these days, as people have talked already, some of the ideas for [inaudible 00:03:02], some of the sequential data originally, is then easily transferred to each other. For example, in audio or music, some problems are very sequential.

When it comes to automatic generation of music, or melody, rhythm, then one can easily start to solve the problem with the exact same abstraction, understanding in NLPs and text. At the same time, it is not really sequential at all, when it just comes to classifying the music genre, for example, then it's just image. It does not depend on either, as Joel already told, LSTM or CNN, it's more about the nature of the problem.

Recent Advances in Your Field

Moderator: What do you see being recent advances in your field? We covered already convolutions, like going everywhere, which is one of the big things currently in sequential data. Is there something apart from that, do you see, impacting your field?

Choi: Some of the good examples would be things like transformer structure, which is super actively used in music generation and understanding things like waveform, which was originally designed for waveform of a speech. It is also motivated from the generation of its image originally, and then actively used for music generations.

Grus: This is what my talk was yesterday, for me, the big recent takeaway is transfer learning, and how we can take these giant pre-trained models that are trained to do one thing, and almost out-of-the-box they work amazingly well on all these other problems. That's a really important direction for solving the problems I think about, in terms of natural image understanding.

Ameisen: What's really interesting is, transfer learning is almost an automated way to give a really good prior to your model. You say, "Well, here's a bunch of language, here's Wikipedia, learn some stuff from it." It's very likely that some of the things you learn from that will be useful for this current task. What's exciting is that that's worked for images, that's worked for NLP.

I'm going to tee this one up for you, one of the questions that I've been having over the last few months of exciting new research approaches is, "Could you think of something similar for just general time series?" Probably most retail businesses have similar curves. Could you train a general model and then do transfer learning in some way? That'd be exciting as well.

Taylor: I'm going to skip talking about that, because everyone can fill in the blanks on that. I'm going to take it in a different direction, I usually tell people, when they ask me about what tools I use, I say, "Tools are boring. That's not the part of the problem I want to talk about." In the last few years the thing that's most exciting is the ability to train these models, is something that we just didn't have five years ago. That's two things, it's like frameworks for doing it, like PyTorch, TensorFlow, and all the sort of ecosystems around them. For Prophet, it was Stan, that's this probabilistic programming language. They all sort of perform the same task, which is they allow you to abstract a way how the models fit from what the model expresses, that means that you can stay focused on the modeling exercise and not about how you're going to optimize the parameters.

At the same time, the software engineering has gotten a lot easier, so we have better paradigms for training and monitoring these things. Training a neural network by a backprop five or six years ago would have taken you a really long time to get right, and now it's becoming much more reliable and robust. This is unlocking a ton of productivity for researchers in many different fields. It's worth just acknowledging that really it is sort of a tooling thing that we solved. GPUs have gotten faster, these frameworks have gotten better, and we've gotten better technologies for optimizing parameters given data. That just unlocks a lot of value.

Leveraging Semantical Ontologies in Developing Models

Participant 1: I have a couple of questions, one is in relation to semantics. There are ways to define different ontologies of words, especially when we work with unstructured data, such as natural text. Are there any approaches where you can leverage semantical ontologies where different words can be classified when you prepare data to be processed? Have you used any of these things? Do you know anything about them? Are they useful, when you try to develop some models, in developing better models?

Grus: With the success of things like BERT and ELMo, people are definitely looking at, "Ok, now we've done this on unstructured text, what if we know something a little bit more about the semantics of language? Can we incorporate that into these models?" That's an active research area where I haven't seen a lot of results yet, but I would expect that you probably will over the next year-ish.

The thing is that when you get these deep contextual models where I'm embedding a word, and that embedding really depends on the 100 words to the right and the 100 words to the left, what is it that I'm not capturing that is left for semantics? There may be something there, but it's not clear to me what it is so it's hard for me to have a lot of intuition on that.

How Machine Learning Can Understand Code

Participant 2: I was wondering if the NLP people talk about where we are in terms of state of the art, and where it's going to go for machine learning on code, how machine learning can understand code and maybe help or even write code.

Grus: There's something very different about understanding code than there is about understanding human language, and that's that human language is fuzzy. If you're trying to understand what I'm saying, and you get one of the words wrong, you probably still get the gist of it, but if you're trying to generate some Python code or a SQL query, and you get one of the words wrong, it's not going to compile. It's this all or nothing thing, so that these "I'm going to take a bunch of text, and learn a giant neural network, and have it work," is typically not sufficient. You need some sort of actual parsing structure on top of it that says, "I have some notion of what a valid Python program looks like. I have some notion what a valid SQL query looks like. I'm going to parse into that syntax, rather than just generating stuff." That is a very active area of research, I don't think we have the impressive results on it yet that we have on the language, from things like BERT and ELMo.

Ameisen: I agree, to build on top of that, a lot of these problems are about how you frame them. If you frame me a problem saying, "Hey, build me a model that's going to generate correct Python code," that's pretty hard. There are a couple of other ways you could frame it. One project I worked on at Insight was you take a snippet of code from some coding interview and say whether it's correct or not. It turns out to be surprisingly easy, because if you just count the parentheses, for example, you can already eliminate a lot of factors that way. It turns out if you go from a simple model to a really complicated model, where we did sort of a recursive neural network, you get a few percent more. We're still pretty far from understanding the code, we're not there yet.

The other thing I will say is, generating code is hard, but finding good representations for code is doable, there's a lot of interesting work on that. Facebook has a couple of blog posts on that, the GitHub team have a couple blog posts on that, if you generate a really good representation of natural language text from the docstrings, which is something that's pretty well understood, and then you do the same for code using some similar methods, and then you find a way to put those embeddings close to each other, you can get natural language code search pretty well, where I type a query, and instead of generating the code, you're just like "Hey, you asked me for 'How do I read a file in Python?' Here's a repository that has probably that function." That's way easier to do.

Sequence Learning

Participant 3: I want to ask a particular question about sequence learning, the differentiation between the context and the causal relationship, because now, we are extending the embedding, or representation of the symbol in the sequence by including more and more context. Does that pose a challenge on applying this embedding, that we have to scrutinize that the application scenario is the same? Is it the same context or the same scenario as where we learn it? When we are getting more advanced in embedding, do we also post that challenge onto ourselves?

Ameisen: I know we have the chief of context that can probably chat about it some more. I'll just say one thing that I think is interesting about what you mentioned, which is transfer learning is amazing. It works really well in practice, but you have to think of what your current application is and maybe for your current application it isn't useful to leverage the entirety of the context of the English language, maybe you have a very specific application where transfer learning could be detrimental, where that context couldn't help you. You have to just take that with a grain of salt, depending on what is it that you're doing, because sometimes that context is very useful. Sometimes it isn't.

Grus: This is something I meant to talk about in my talk, and I neglected to -s o I'm glad that you asked it - which is the following. You're totally right, which is why when you start with, say, BERT, you have to fine-tune it on your specific problem domain. If you're looking at a certain class of documents and trying to classify them or whatever, BERT, out-of-the-box, is probably not going to do great and you will need to fine-tune that on labeled examples of what you're trying to accomplish. The thing that the contextual embeddings buy you is that those contextual embeddings are trained on millions, and millions, and millions of words.

You get so much information baked into those embeddings already, whereas when you're doing some classification problem, you might have 25,000 examples, or 100,000. You have so much less data that you're fine-tuning on that if you started with GloVe vectors, say, and then tried to learn all the context when you're doing the training, you're learning that context with so much less data that when you have the BERT model, which is trained on so much text, even if the context is in a slightly different domain, it gives you such a rich starting point that starting with that and then just doing fine-tuning on your data gives you a much better representation of the problem. That's my intuition, anyway.

Machine Learning & Argument Extraction

Participant 4: With regards to where this field is heading, I've been reading about some work that's been taking place in Europe, University of Dundee. It's relating to argument mining, it's only research work, there are challenges behind it. The main idea is to develop some sort of a machine learning solution that can actually read text and extract arguments from it. With regards to your experience, do you know of any approaches? How far are we from something like that? At the moment we can parse text, identify its different words, obviously in it, and process them in different ways, but when it comes to more complex tasks where we try to infer meaning, how far has the field reached and where do you see us heading from where we are in the next couple of years?

Grus: I don't know much about this particular problem, the thing it reminds me a little bit of is summarization. You take some text, and you want to summarize it. We're starting to be good at that, with generating summaries rather than just extracting them from the text. I don't really know enough about this particular problem to say.

Ameisen: I don't know much more, but what I will say is that I also think of summarization. What you were talking about seems to maybe be closer to extractive summarization, "Here's a bunch of text. Pick the few sentences that are key." That works really well, in practice, it's how most summarization is done. As Joel was saying, what the field is currently working on is abstractive summarization, where you generate it yourself, but that's a different problem. I don't know of particular examples, I'm a purist.

Taylor: I'm going to pivot this to time series, because there's a related problem in time series that I've been interested in, and it has a sort of similar flavor, which is, when you have millions or billions of time series, you'd like to say, "What's going on?" and when you have so many of them, you can't look at each individual one. It's like a summarization problem, like "I want to tell a story about how the system is functioning, even though I have tens of thousands of time series to look at."

There's a lot of easy stuff you can do with dimensionality reduction, and clustering, and stuff like that, but the hardest problem is, when we propose, "Hey, here's a summary of what's going on," is that an adequate representation to a human? Which is the hard problem, we don't have good labels on, "What does a summarization of a large set of data look like to a human? Would they think that it makes sense to them?"

It's the same problem with text summarization, which is that if we don't have labels, we can produce summaries, probably in many different ways. That ground truth that that is a good summary is probably so hard to come by, and it's going to take us a while to get to the point where machines can talk to us. We've learned really well how to talk to machines, because we designed them, but to get to the point where they can tell us what's happening, in terms that we understand, is probably going to be a lot longer of a process.

Comparing BERT, ULMFiT & ELMo

Participant 5: I haven't got the chance to deeply look into BERT and ULMFiT, but I'm curious what are your thoughts on BERT versus ULMFiT versus ELMo, generally. Are there cases where one works better than another? Also, specifically in the context of limited labeled data?

Grus: The short answer is, there's always context where one of them works better than the others. If you invented one of them, you'd probably rattle off those contexts off the top of your head. I feel slightly disloyal for saying this, because ELMo was invented by my colleagues, but I feel like if you were starting a project, you would probably at this point need a good reason not to start with BERT. I just feel like it performs best across a wide variety of tasks, and so I would personally start with it. If I found it didn't work, I might try something else.

Taylor: The old “try everything” approach.

Grus: Try the thing that's most likely to work.

Intersection of Audio and Text

Participant 6: I have a question about problems at the intersection of the domains you talked about. Are there interesting problems at the intersection of audio and text, or text and time series, or the other combination?

Grus: There's a whole field - Amazon employees, thousands of people working on Alexa who are working on problems at the intersection of audio and text, so there's a ton there. That's not my field, so I don't know a ton about it, but there's certainly a lot there.

Choi: That's similar to the motivation for the text summarization. When you have music, which is 10-minutes, 3-minutes, or this audio signal, then people want some very condensed representation that we can understand without really listening to everything. There's many caption generation, which is a popular problem in image, but also in audio files, music, speech.

Ameisen: I'd say there's a bunch of problems that are really interesting, in terms of grounding one representation in another. A lot of what you're talking about is, "Can you go from an image to text?" That can be image captioning, it can be an image that's actually a spectrogram, it can be an image where it's a screenshot or a sketch of a website, and the text is the HTML to generate it, if you want to automatically generate it. There's a lot of really cool practical projects that work well on that.

The way that I started thinking about it is most of these individual domains have ways to build dense representations. Then once you have that, usually you can try to find a way for it either to leverage the representations in one domain to go to the other, or you can try to do something end-to-end, if you're feeling comfortable. There's a lot of really interesting things on going from one end to another in different ways for different problems. Anything that touches those where you can find a representation, you can find a use case.

Taylor: I might think of all the stuff that's going on in NLP and text as the starting point. For anything we're going to be doing in machine learning, we'll probably start with text, because it's such a great encoding of reality and semantic content on its own. When you think about areas for overlap, or fruitful areas, text is the leading indicator of what we'll be doing in other domains. It's just going to take longer to take it to the point where we have good features in other spaces. In time series, we don't have good representations. In images, we have poorer representations than we do in text. It's just going to be a matter of how long it takes us to get to representations that are as good as the ones that we have in text. Also, the volume of data that we have for text is laughable, because humans like to generate a lot of it.

Grus: Just to hype my employer a little bit, one of the teams at Allen Institute focuses mostly on vision, but also on problems at the intersection of vision and text. They released a project a month or two ago called Iconary, where you can play Pictionary against an AI in both directions. Rather than drawing it, it picks emoji-like icons and arranges them. Either it will think of something, and it will show you the pictures to try and get you to guess it, or you can do it the other way around, where you choose the icons and make a picture out of them, and the AI has to take that and try and figure out what you're trying to do. If you Google for Iconary, you can find it, it's a pretty interesting demo.

Moderator: I can add to that as well, it's one of the reasons why the panel is as well-situated as it is today. I worked in time series data before, the thing is, if you work with time series data, you start with tools like Prophet and you establish a baseline with that, because it's easy to use. It gives you immediate results, and you can understand it still. Then you take a lot of inspiration of all the amazing NLP networks which are out there, and you can try stuff like attention on time series data, where you then can see, "Ok, what of the inputs influence my output?"

There's some work on uncertainty in deep learning, which really applies to a lot when it comes to time series data. Of course, you have this topic which interleaves when you do pictures to text or the other way around, or audio to text. There's as well, you can take a lot of the models from the other field and just apply it for your own, and you don't have to worry that it's learning something, because you know it learned something in the other field.

The Appeal of Natural Language Processing

Participant 7: Data science is a vast field, you have all chosen to specialize in NLP. I'd really like to know what is it exactly that fascinates you about natural language processing? With regards to your experience, what would be that advancement that you would personally think that it would be a breakthrough in the field?

Grus: What appeals to me about natural language processing is two things. One, the problems in it are very easy to understand and relate to, understanding what text means, anyone can appreciate that if they don't know the first thing about computer science. That's one aspect of it, the second aspect of it is that it's really hard, there are so many hard problems in NLP. Before I started doing NLP, I didn't appreciate how hard it was, and how far away we are from truly getting computers to understand language. We're super far away from that still.

In terms of what would I consider a breakthrough, I don't feel like there is that one breakthrough. I feel like it's going to be incremental steps, and incremental steps, and incremental steps, it's not like there's one thing where it's like "Ok, we're done." If you look at the history of the field, it's, "Here's a problem we're all working on. We've made a huge advance. Ok, let's focus on the places where that new model that we came up with does really poorly. Ok, that's our new problem, because that's what we suck at." Every new breakthrough reveals a new set of hard problems that, "Wow, we're still bad at that.", it just keeps advancing like that, rather than in one giant step.

Ameisen: What's interesting to me about NLP is, because I was at Insight, I got to see hundreds of projects from fellows, also, we interview a bunch of people, so a bunch of people show us their cool project. You quickly get some sort of pattern recognition, but also you quickly get jaded. If anybody ever shows me the steering prediction or stop sign detection project ever again, I would go crazy. I kept seeing image projects, not to say, "Oh, computer vision is whatever." I kept seeing sort of similar projects, where it was like "Oh, I detect this. Give me an image, I detect that, or I classify it in there."

Then there was NLP, which initially, similar to you, I was like "Oh, yes, you can understand language." Then as soon as I started diving into, "What's the language model? What's named entity recognition?" All these tricky concepts of things that were very simple to explain, where it's like "Yes, I want to know if somebody mentions a city, or a state, or a hotel." You'd be like "Well, that's probably easy." That's not easy at all, that was a very compelling proposition.

The other thing I'd say that keeps me excited is that as a field it's moving extremely fast, and that there's a lot to be unlocked. When I worked at my previous company, we had hundreds and hundreds of customer comments that were saying, "Here's what I thought about your product." We literally could not go through all of them, because there was just so many. Some simple tools even that you have today that weren't really easy to use before could cluster that, or tell you, "Oh, these are the main things that we should improve on. People keep complaining about that," or that sort of thing. It's very concrete use cases that weren't possible five years ago, and that now anybody can do, importing three lines of Python. It keeps me excited.

Prophet & the Role of Feedback

Participant 8: My question is regarding the real world. What is the role of the feedback from someone that uses Prophet or the products that you produce, how does that work in your world?

Taylor: You're asking about how do I change Prophet in response to feedback from people?

Participant: Exactly, and what is the role of feedback when you create these awesome models and you put them in the real world? Like you said, as well, and then what happens.

Taylor: Feedback is great, but it's often hard to respond to. Joel talked about this with the ELMo versus BERT thing, some will work in some conditions, and some will work not in others. When we were developing Prophet, we had 12 core time series use cases that we wanted to get right, so every time we'd make a new change to the model, we'd run this. They weren't unit tests, but we'd test all these different cases, and we'd say, "Oh, we did better, mostly, across all of them."

Now, as hundreds or thousands of people use it and sometimes get good results, and sometimes bad, it's really hard to do that and to know that if I make a change to the model, that that will make them better or worse off. This is an interesting thing that we do as people who build tools, is that when you improve the tool, you don't improve it for everyone all the time. It can get worse for some people sometimes, you might not know that. The only real way that I can see to be robust to that is to try not to do too much.

I'm a big fan of the Unix philosophy, which is "Every tool has a really simple interface and does one thing really well." They become exchangeable and improvable, or you can swap new versions of that in and out. That's the main way I think of it. You're battling complexity, and you need to keep your tools and your interface really simple, try to make sure that you cover everybody's use cases as best as you can, but also say when you're not willing to solve someone's problem because it's not within the purview of what you were trying to do in the first place.

Grus: The thing I would add is that, in the field that I work in, people make advancements usually by applying new techniques to these same datasets that everyone in the industry is working on. It's dangerous to overfit to these datasets, that example that I showed in my talk about how if you ask, "How many hotels are there in San Francisco?" it chooses 55 because it wants to find a number, and that's the only number in the sentence.

With these models, they're not so much customer feedback, but people try and break them. When they try and break them, they find out, "Gosh, you assumed this thing that was true in your dataset, but your model's actually really dumb." One other example I didn't talk about, but is really interesting, is in this textual entailment where you have two sentences, and you want to say, "Do they contradict each other?" Well, those sentences were generated by having Mechanical Turkers like take sentences and make them contradict each other. They did things like "Oh, the first sentence mentions cat. I'll change this to dog in the second sentence."

What some people discovered, they took these state of the art models on textual entailment and found that if you take away the first sentence, it can still learn to do really well, and predict contradiction, even without seeing the first sentence. Things like dog in the second sentence is a good sign that it's a contradiction, because that's how people made it the contradiction was by changing things to dog. Having people really hammering on these models that are state of the art really helps you find their shortcomings, and helps you say, "Gosh, we thought we were good. We're not good, let's go figure out, 'These are the hard problems,' and how to do them better."

State of the Art in Topic Modeling

Participant 9: Could someone speak to where we are state of the art in terms of topic modeling? If you have a set of documents with text, and you're trying to understand what the different topics are across those set of documents, and classifying those. Can someone speak to that problem?

Ameisen: Yes, that's an interesting one to me because there's the traditional sort of topic modeling approaches that would in an unsupervised way, generate a bunch of topics. Then, you have hyperparameters where you say, "Well, we think there's like three topics in there," and then judge three topics. You look at the words in each topic and you're "Hmm, this looks like maybe, it's about spacecraft or something."

What I've seen from when I was at Insight, other companies, when they try to tackle these problems, is a two-stepped approach, where first you do that, you do other things, just to sort of cluster things in various ways. Then, you identify the topics you care about, then, you sort of make it a supervised problem. I feel like there are two things that I've seen work well in practice, is make things supervised. Then, when you're in industry, back to Joel's point, the dataset doesn't matter. It does matter a lot, but you don't have standard dataset so work on the dataset.

By doing both of these things, not saying "Oh, we're going to do a great unsupervised model," but saying, "We're going to curate this really nice dataset of a couple topics, or 10 topics, or 100 topics we really care about, add labels to it, and train a supervised model that tells you, 'Hey, this belongs to this category.'" That works really well in practice.

Nonsymmetric Frequency-Based Solutions

Participant 10: Sean [Taylor], in your talk, you had a graph, you were trying to take time periodicity out of it. Then you came across an example where the period wasn't symmetric, and it didn't do so well. I'm wondering, what is the solution that one could apply to something that wasn't symmetric frequency-based?

Taylor: When you have complicated periods in data, one way to think about it would be you just expanded your search base for the pattern in the data by a large order of magnitude, which is that there are so many different types of periods that you'd have to search. You could say, "Generate the basest expansion for all the possible periods," and then let the model try to fit that and figure it out. The problem is that any individual time series that you have has almost no hope of being large enough for you to be able to learn that. If you had like a super long time series for many years, with very fine-grain measurements, maybe you'd have some shot at doing that. Any individual time series, you have to rely on some domain knowledge about the period, and the shape of it, to make any headway on it.

Very much in line with the way I was describing the goal of Prophet was to be biased toward the kind of cases that you're likely to see in practice, which is like how the human brain works. We're not a super flexible learner, we have structures that we impose on things that we expect. It's a trade-off, it probably won't work well in a lot of scenarios. Andrew Gelman posted about Prophet when we first released it, because he's really excited because it uses Stan. He posted this example of lemur populations, which have a weird cycle to them, which is periodic but not on a yearly scale. He's like "It probably won't work on this." I was like "You're right, Andy," because we're not expecting anything like that. Businesses don't encounter time series like that in practice.

Participant 11: I could throw one at you that's pretty popular. It's the internet traffic, or any commerce, or any business, because our world has this large Pacific Ocean that isn't quite matched by the size of the Atlantic Ocean, so you get a double hump with one relatively diminutive.

Taylor: Yes, that's right. Often, that's an aggregation problem, if you aggregate everything, you end up with all the periods mixed together, and you can end up with weird things. I'm sure this happens in sound all the time, too, you're mixing together a bunch of different waveforms, and the result might not have the same patterns as you'd expect. There'd be clear patterns if you were able to break them apart, but they wouldn't be present in the aggregate.

Datasets Containing & Training on Time Series

Moderator: We see huge advances in NLP probably as well because NLP data are cheap to acquire, you just go on the internet and crawl it. Do you think we'll see datasets which contain, let's say, a considerable amount of time series where one can train on? Same applies for music, they have different restrictions when it comes to copyright. Do you think we'll see such data? Maybe even degraded ones, like with music just at low quality so nobody really wants to listen to it, but we could use it for deep learning. Same for time series, you said if all the businesses probably join their time series together, they would all be better.

Taylor: Yes, you're absolutely right, producing datasets that provide a way for people to benchmark against one another is a super valuable activity. Just like the underlying training algorithms getting better, agreeing on what a good dataset is, is something that has been a big productivity unlocker for, say, the image recognition field. For time series, there already is the M3 and M4-Competitions, so they gather together a lot of time series for forecasting. I haven't seen anybody try to do that in a more modern way, with a greater diversity of datasets. You're right, that's a really great opportunity for probably some researcher to come unlock a lot of value.

Choi: Yes, copyright is a big problem, such a big issue in researching music especially. In corporate, if you just widen the scope to just audio, maybe including speech or sound, thanks to those smart speakers, people, at least in some companies, can have access to a huge dataset. That doesn't really help just everyone in the world, outside the company. One good source in terms of that is YouTube, when it comes to music and audio research. Because of YouTube, people just upload every random thing, including some music. Especially for music, it's a bit in a gray area, for some reason, the law is different for the music streaming service and the video streaming service. For video streaming service, people can just listen to music on YouTube without even signing up. That helps me to crawl lots of music without concern of breaking the law. Then, there's some other cases like some audio dataset, there's a bunch of different sources to where people can download different kinds of audio stuff, fortunately.


See more presentations with transcripts



Recorded at:

Jul 18, 2019

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p


Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.