InfoQ Homepage Data Content on InfoQ
-
Nexla Launches Express: a Conversational Platform for AI Data Engineering
Nexla recently introduced Express, a conversational data engineering platform designed to dramatically lower the barrier for building data pipelines for AI applications.
-
Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data
Meta’s OpenZL changes the way data is compressed by maximizing efficiency for structured datasets, outperforming traditional methods like Zstandard. With a universal decompressor and custom compression plans, it simplifies operational deployment while achieving superior compression ratios and speeds, making it an essential tool for modern data infrastructures.
-
Vercel Introduces Drains for Unified Data Export
Vercel has released Vercel Drains, a system for exporting observability data from its platform into external services. The feature unifies logs, distributed traces, web analytics events, and performance metrics into a single streaming mechanism.
-
Hugging Face Introduces AI Sheets, a No-Code Tool for Dataset Transformation
Hugging Face has released AI Sheets, an open-source application designed to let users build, transform, and enrich datasets using AI models through a spreadsheet-like interface. The tool, available both on the Hub and for local deployment, allows users to experiment with thousands of open models, including OpenAI’s gpt-oss, without requiring code.
-
TanStack DB Enters Beta with Reactive Queries, Optimistic Mutations, and Local-First Sync
Introducing TanStack DB: a groundbreaking embedded client-side database that revolutionizes frontend development. With features like reactive queries, typed collections, and optimistic mutations, TanStack DB simplifies state management, ensuring blazing-fast updates. Easily integrate with existing TanStack Query applications in an open-source, beta format.
-
Google Launched LangExtract, a Python Library for Structured Data Extraction from Unstructured Text
Google has introduced LangExtract, an open-source Python library designed to help developers extract structured information from unstructured text using large language models such as the Gemini models.
-
Synthetic Data Generator Simplifies Dataset Creation with Large Language Models
Hugging Face has introduced the Synthetic Data Generator, a new tool leveraging Large Language Models (LLMs), that offers a streamlined, no-code approach to creating custom datasets. The tool facilitates the creation of text classification and chat datasets through a clear and accessible process, making it usable for both non-technical users and experienced AI practitioners.
-
Setting up a Data Mesh Organization
A data mesh organization: producers, consumers, and the platform. According to Matthias Patzak, the mission of the platform team is to make the lives of the producer and consumers simple, efficient and stress free. Data must be discoverable and understandable, trustworthy, and shared securely and easily across the organization.
-
Data Teams Survey: Lag in DataOps and Value Delivered
We report on Jesse Anderson's 2024 Data Teams Survey which showed a lag in DataOps capabilities, slow LLM adoption, and a concerning decline in perceived value creation by data teams. It called out the importance of teams spread with data science, engineering, and operations capabilities. We also cover Petr Janda's recent podcast on the need for more engineering rigour for parity with other teams.
-
Anthropic Unveils Contextual Retrieval for Enhanced AI Data Handling
Anthropic has announced Contextual Retrieval, a significant advancement in AI systems' interaction with extensive knowledge bases. This technique addresses the challenge of context loss in Retrieval-Augmented Generation (RAG) systems by enriching text chunks with contextual information before embedding or indexing.
-
How to Develop a Culture of Quality in Software Organizations
According to Erika Chestnut, software organizations can develop a culture of quality with a clear commitment from leadership, not only to endorse quality efforts in software teams, but also to actively champion them. This commitment and advocacy should manifest in data-driven decision-making that strikes a balance between innovation and quality, ensuring that we maintain the highest quality.
-
Cloudflare One Data Protection Suite for Data Security across Web, Private, and SaaS Applications
Cloudflare recently announced its One Data Protection Suite, a unified set of advanced security solutions designed to protect data across every environment – web, private, and SaaS applications. The company states the suite is powered by Cloudflare’s Security Service Edge (SSE), allowing customers to streamline compliance in the cloud, mitigate data exposure and loss of source code.
-
6 Tracks Not to Miss at QCon San Francisco, October 2-6, 2023: ML, Architecture, Resilience & More!
At InfoQ’s international software development conference, QCon San Francisco (October 2-6) 2023, senior software practitioners driving innovation and change in software development will explore real-world architectures, technology, and techniques to help you solve such challenges.
-
QCon New York: Five Tracks to Level-up on the Latest Software Development Practices
The 2023 edition of the QCon New York (June 13-15) software development conference, hosted by InfoQ, is set to bring together over 800 senior software developers. The three-day conference will feature over 80 innovative senior software practitioners from early adopter companies sharing how they are solving current challenges, providing new ideas and perspectives across multiple domains.
-
Zero-Copy In-Memory Sharing of Large Distributed Data: V6d
Zero-copy and in-memory data manager Vineyard (v6d) is maintained as a CNCF sandbox project and provides distributed operators that can be utilized to share immutable data within or across cluster nodes. V6d is of interest particularly for deep network training on big (sharded) datasets such as large language and graph models.