InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Doordash Introduces ML to Understand the Marketplaces Status
DoorDash introduces an ML model to predict the operational status of a store in order to increase the user experience and save thousands of orders cancellation. Understanding the merchant’s operational status and the ability to receive and fulfill orders is crucial for the DoorDash platform.
-
Amazon Athena Now Supports Apache Spark Engine
Amazon Athena now supports the open-source distributed processing system Apache Spark to run fast analytics workloads. Data analysts and engineers can use Jupyter Notebook in Athena to perform data processing and programmatically interact with Spark applications.
-
Google Address Validation API Is Generally Available to Improve Address Accuracy
Google recently announced the general availability of the Address Validation API. The new feature of the Google Maps Platform validates an address, standardizes it for mailing, and determines the best-known geocode location for it.
-
GitHub Releases Copilot for Business amid Ongoing Legal Controversy
GitHub has announced Copilot for Business, a business plan for their OpenAI-powered coding assistant Copilot. The release follows a recent class action lawsuit against Microsoft, GitHub, and OpenAI for violating open-source licenses.
-
eBay New Recommendations Model with Three Billion Item Titles
eBay developed a new recommendations model based on Natural Language Processing (NLP) techniques and in particular on BERT model. This new model, called “ranker,” uses the distance score between the embeddings as a feature; in this way the information in the titles of the products is analyzed from the semantic points of view.
-
BigCode Project Releases Permissively Licensed Code Generation AI Model and Dataset
The BigCode Project recently released The Stack, a 6.4TB dataset containing de-duplicated source code from permissively licensed GitHub repositories which can be used to train code generation AI models. BigCode also released SantaCoder, a 1.1B parameter code generation model trained on The Stack. SantaCoder outperforms similar open-source code generation models.
-
AWS Releases SimSpace Weaver for Real-Time Spatial Simulations
AWS recently released SimSpace Weaver, a managed option to run real-time spatial simulations across multiple EC2 instances. Distributing simulation workloads, the service can handle large real-world environments, crowd simulations, and immersive interactive experiences.
-
3D Point Cloud Object from Text Prompts Using Diffusion Models
OpenAI recently released an alternative method called Point-E for 3D object generation from text prompts that takes less than two minutes on a single GPU, versus the other methods that could take a few GPU hours. This new model is based on diffusion models, which are generative models like GLIDE and StableDiffusion.
-
Google AI Unveils Muse, a New Text-to-Image Transformer Model
Google AI released a research paper about Muse, a new Text-To-Image Generation via Masked Generative Transformers that can produce photos of a high quality comparable to those produced by rival models like the DALL-E 2 and Imagen at a rate that is far faster.
-
Deep Learning Pioneer Geoffrey Hinton Publishes New Deep Learning Algorithm
Geoffrey Hinton, professor at the University of Toronto and engineering fellow at Google Brain, recently published a paper on the Forward-Forward algorithm (FF), a technique for training neural networks that uses two forward passes of data through the network, instead of backpropagation, to update the model weights.
-
Waymo Developed Collision Avoid Test to Evaluate Its Autonomous Driver
Waymo developed a testing framework called Collision Avoidance Test (CAT) to evaluate the ability to avoid crush or potential hazard situations of its Waymo Driver, compared to a human driver.
-
Amazon Releases Fortuna, an Open-Source Library for ML Model Uncertainty Quantification
AWS announced that Fortuna, an open-source toolkit for ML model uncertainty quantification, has been made generally available. Any trained neural network can be used with the calibration methods offered by Fortuna, such as conformal prediction, to produce calibrated uncertainty estimates.
-
Generating Text Inputs for Mobile App Testing Using GPT-3
A group of researchers from the Chinese Academy of Sciences and Monash University have presented a new approach to text input generation for mobile app testing based on a pre-trained large language model (LLM). Dubbed QTypist, the approach was evaluated on 106 Android apps and automated test tools, showing a significant improvement of testing performance.
-
Google Publishes Technique for AI Language Model Self-Improvement
Researchers at Google and University of Illinois at Urbana-Champaign (UIUC) have published a technique called Language Model Self-Improved (LMSI), which fine-tunes a large language model (LLM) on a dataset generated by that same model. Using LMSI, the researchers improved the performance of the LLM on six benchmarks and set new state-of-the-art accuracy records on four of them.
-
Microsoft’s New Memory Optimized Ebsv5 VM Sizes in Preview Offer More Performance
Microsoft recently announced two additional Memory Optimized Virtual Machines (VM) sizes, E96bsv5 and E112ibsv5, to the Ebsv5 VM family developed with the NVMe protocol providing performance up to 260,000 IOPS and 8,000 MBps remote disk storage throughput.