InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
LangChain - Working with Large Language Models, Made Easy
LangChain is a framework that simplifies working with large language models (LLMs) such as OpenAI GPT4 or Google PaLM by providing abstractions for common use cases. It supports both JavaScript and Python.
-
Pfizer Uses Serverless Architecture on AWS to Scale Processing of Digital Biomarkers
Pfizer upgraded the serverless architecture for processing digital biomarker data at scale to make it more flexible and configurable. They created a framework that uses a file processing pipeline built with AWS Step Functions and other serverless services, as well as a custom Python package for data ingestion and processing.
-
Meta's Voicebox Outperforms State-of-the-Art Models on Speech Synthesis
Meta recently announced Voicebox, a speech generation model that can perform text-to-speech (TTS) synthesis in six languages, as well as edit and remove noise from speech recordings. Voicebox is trained on over 50k hours of audio data and outperforms previous state-of-the-art models on several TTS benchmarks.
-
AI, ML, Data Engineering News Round up: Claude 2, Stable Doodle, CM3leon, Llama 2, Azure and xAI
The most recent update, covering developments from July 17th, 2023, showcases significant progress and announcements in the fields of data science, machine learning, and artificial intelligence. This week's focus centers on Anthropic, Stability AI, Microsoft, Meta and xAI.
-
Grammarly Replaces its in-House Data Lake with Databricks Platform Using Medallion Architecture
Grammarly adopted the medallion architecture while migrating from their in-house data lake, storing Parquet files in AWS S3, to the Delta Lake lakehouse. The company created a new event store for over 6000 event types from 40 internal and external clients and, in the process, improved data quality and reduced the data-delivery time by 94%.
-
GitHub Details Key Prompt Engineering Practices Used to Build Copilot
Prompt engineering is key to creating effective LLM-based applications and does not require to have a PhD in machine learning or generative AI, say GitHub engineers Albert Ziegler and John Berryman, who also shared the lessons they learned developing GitHub Copilot.
-
JetBrains Unveils AI Assistant for IntelliJ-Based IDEs and .NET Tools
JetBrains, the software development company known for creating the IntelliJ IDEA, has announced the introduction of a new AI Assistant in its Early Access Program (EAP) builds for all IntelliJ-based IDEs and .NET tools. This significant addition is aimed at transforming the landscape of software development tools by integrating generative AI and large language models into JetBrains' products.
-
Google Releases Hive-BigQuery Open-Source Connector
Google recently announced the general availability of the Hive-BigQuery Connector, simplifying integration and migrations between Apache Hive and Google BigQuery. The open-source connector is a Hive storage handler that enables Hive to interact with BigQuery's storage layer.
-
Microsoft Introduces the Public Preview of Vector Search Feature in Azure Cognitive Search
At its annual Inspire conference, Microsoft recently announced the public preview of Vector search in Azure Cognitive Search, a capability for building applications powered by large language models. It is a new capability for indexing, storing, and retrieving vector embeddings from a search index.
-
Meta AI Reveals CM3leon, an Advanced Text-to-Image Generative Model
Meta AI has introduced CM3leon, a novel multimodal model combining text and image production. This model is the first of its type, using a modified formula from text-only language models to deliver remarkable outcomes with unequaled computational efficiency.
-
Microsoft Azure Managed Lustre for HPC and AI Workloads Now Generally Available
Microsoft recently announced the general availability (GA) of Azure Managed Lustre, a managed file system for high-performance computing (HPC) and AI workloads.
-
Introduction to Mojo Programming Language
Mojo is a newly presented programming language that combines the simplicity of Python with the speed and memory security of Rust. It is at an early stage of development and offers users an online playground to explore its features. Mojo aims for excellence in data science and machine learning, providing a fast alternative to Python. There are gradual plans to make it available to open-source.
-
Berkeley Open-Sources AI Image-Editing Model InstructPix2Pix
Researchers from the Berkeley Artificial Intelligence Research (BAIR) Lab have open-sourced InstructPix2Pix, a deep-learning model that follows human instructions to edit images. InstructPix2Pix was trained on synthetic data and outperforms a baseline AI image-editing model.
-
EU AI Act: the Regulatory Framework on the Usage of Machine Learning in the European Union
After the first publication of the proposal on the operation of machine learning applications in 2021, on June 14th negotiations have started for the realization of the legislation in the EU Council. The EU countries are expected to reach an agreement by the end of 2023. The EU Act takes a risk-based approach and plans to avoid disproportionate prescriptions when executing the regulations.
-
Databricks Unveils Lakehouse AI and MosaicML Acquisition at Data + AI Summit
The Data and AI company Databricks recently unveiled Lakehouse AI, a suite of tools for building and governing generative AI models, including large language models (LLMs), within the Databricks platform. Among the tools were LakehouseIQ, a "knowledge engine" that uses AI to understand a company's unique data, culture, and language in order to improve natural language interfaces like chatbots.