Data science

[ follow ]
Data science
fromComputerworld
19 hours ago

Solving world hunger with data

Curiosity and technical instincts can enable a transition from software development to data leadership, with risk-taking leading to long-term career payoff.
Data science
fromMedium
1 week ago

Orchestrating RAG pipelines with Apache Airflow

Apache Airflow provides flexible, reliable orchestration for production GenAI pipelines, enabling tool-agnostic, extensible, retry-capable workflows for embeddings, vector storage, and query pipelines.
Data science
fromTheregister
4 days ago

Neo4j intros 'property sharding' to tackle scalability

Infinigraph's property sharding enables horizontally scalable graph storage while preserving traversal performance and supporting both transactional and analytical workloads on a single system.
fromBusiness Matters
6 days ago

Why Every Trader Needs a Crypto Backtesting Tool Before Going Live

Trading can be exciting, but it is also unpredictable. Many traders lose money because they start trading live without testing their strategy. This is where backtesting comes in. It allows traders to test their strategies on historical trading data before risking real money. By understanding how a strategy would have worked in different market conditions, traders can make smarter decisions and reduce risks.
Data science
Data science
fromESPN.com
6 days ago

Matchup rankings: Drake Maye, Ricky Pearsall stand out in Week 2

Start the player with the superior matchup using schedule-independent Adjusted Fantasy Points Allowed to compare defenses after calibrating for strength of opponents.
Data science
fromMedium
1 week ago

Basics of Big Data and Streaming

Scala, Spark, Kafka, and Amazon EMR together enable scalable, high-performance batch and real-time big data processing pipelines.
#ai
fromInfoWorld
1 month ago
Data science

Orchestrating AI-driven data pipelines with Azure ADF and Databricks: An architectural evolution

fromInfoWorld
1 month ago
Data science

Orchestrating AI-driven data pipelines with Azure ADF and Databricks: An architectural evolution

Data science
fromABC7 Los Angeles
6 days ago

See how your cost of living has changed with the ABC Price Tracker

Interactive Price Tracker shows decade-long, region-specific prices for essentials across the 100 largest U.S. metro areas and updates automatically with the latest data.
#data-strategy
fromMedium
1 week ago

You might be a victim of corrupt personalization

Netflix emphasizes that the more you use the platform, the more personalized it will become. Source. Are you sure your feeds - Netflix, Amazon, whatever social media you prefer - is providing you with personalized content? (More about the difference between personalization and customization here.) Are you being given content that aligns with your actual interests, or is the algorithm steering you around?
Data science
Data science
fromRubyflow
1 week ago

Topical: Topic Modeling Pipeline for Ruby

A Ruby gem that provides a complete topic modeling pipeline using ClusterKit clustering and c-TF-IDF, combining Rust performance with Ruby usability.
Data science
fromDATAVERSITY
1 week ago

Women in Data: Meet Andrea Barber - DATAVERSITY

Andrea Barber builds accessible, beginner-focused Python and data analytics resources while advancing women’s empowerment and ethical, equitable use of healthcare data.
#databricks
fromInfoQ
1 week ago

Google Spanner Unifies OLTP and OLAP with Columnar Engine

Google recently introduced a columnar engine for its globally distributed database, Spanner, intending to resolve the long-standing conflict between online transaction processing (OLTP) and analytical query processing (OLAP). The new feature, currently in preview, allows Spanner (Enterprise and Enterprise Plus editions) to handle both workloads simultaneously on a single database, eliminating the need for separate data warehouses and complex ETL (Extract, Transform, Load) pipelines.
Data science
fromInfoWorld
1 week ago

Databot: AI-assisted data analysis in R or Python

Can you create a histogram of game total scores to see the distribution of scoring? Could you make a box plot comparing home vs away team scores? Let's create a scatter plot of temperature vs total score to see if weather affects scoring. Can you show me the distribution of betting spreads and how they relate to actual game results? Could you create a visualization showing win/loss records by team?
Data science
Data science
fromWIRED
1 week ago

Is Congestion Pricing Working? The MTA's Revamped Data Team Is Figuring It Out

MTA's data team published real-time congestion-pricing and vehicle-entry data, centralizing transit datasets to increase transparency and enable public evaluation.
fromBarchart.com
1 week ago

Google Just Surged 9%! Here are 2 Options Trades to Keep Riding the Rally

Want to use this as your default charts setting? Save this setup as a Chart Templates Switch the Market flag for targeted data from your country of choice. Open the menu and switch the Market flag for targeted data from your country of choice. Need More Chart Options? Right-click on the chart to open the Interactive Chart menu. Use your up/down arrows to move through the symbols.
Data science
fromLondon Business News | Londonlovesbusiness.com
1 week ago

How data is changing the business of sports and fan engagement - London Business News | Londonlovesbusiness.com

Cheering at the stadiums and buying replica jerseys shifted to new ways to consume sports. Live matches on the Sportsbet betting platform, social media, fantasy leagues, highlights, and apps are capturing the attention of today's fans. Teams and brands understand that to keep fans engaged, they need to meet them wherever they are. This triggered an entirely new approach based on data about fans' behaviours, which proved to be just as valuable as the sports themselves.
Data science
fromFlowingData
1 week ago

What counts as rude behavior in public, by age group

Pew Research asked U.S. adults if certain behaviors in public, such as cursing or smoking, were acceptable. The above are the results for four age groups. For every behavior, the percentage of people who said it was rarely or never acceptable increased with age. Television and movies (and my own experiences) would tell you that sounds about right, but for some reason the clear trend surprised me. A quiz with the behaviors lets you get in on the action to see how crotchety you are.
Data science
Data science
fromElectronic Frontier Foundation
2 weeks ago

Open Austin: Reimagining Civic Engagement and Digital Equity in Texas

Open Austin trains Central Texans to build open-source civic technology, scaling a Data Research Hub answering residents' questions for community-driven solutions.
fromInfoWorld
2 weeks ago

From Teradata to lakehouse: Lessons from a real-world data platform modernization

Over the course of several years designing and delivering enterprise data platforms for a global pharmaceutical leader, I witnessed firsthand how data had evolved from a backend enabler to a frontline business asset. The organization was no longer just looking to report historical performance; it needed to predict outcomes, personalize patient engagement, customer engagement, brand performance and make regulatory decisions in near real time.
Data science
Data science
fromSimplilearn.com
2 years ago

Machine Learning Engineer vs. Data Scientist: How Do They Differ? | Simplilearn

Nearly every industry is being disrupted by Machine learning and data science.
They're so prevalent that many of us don't even realize how much they've changed our world.
Data science
fromFlowingData
2 weeks ago

Most American and British words

Spoken-word usage shows greater American–British divergence than written language, increasing as more commonly spoken words are emphasized.
Data science
fromInfoWorld
2 weeks ago

Using Cosmos DB in Microsoft Fabric

Cosmos DB integrates with Microsoft Fabric, enabling large-scale analytics of operational data for enterprise AI across diverse data types and familiar data science tools.
Data science
fromZDNET
2 weeks ago

Graph databases are exploding, thanks to the AI boom - here's why

Graph databases are the fastest-growing database category, driven by AI, with projected annual growth rates around 24–26%.
Data science
fromQuansight
2 weeks ago

Expressions are coming to pandas!

Pandas added a new, chainable column-assignment syntax to replace lambda-based patterns, improving predictability, introspection, and safety for dataframe operations.
Data science
fromTechzine Global
2 weeks ago

VMware launches Tanzu Data Intelligence for AI-driven apps

Tanzu Data Intelligence provides an on-premises enterprise lakehouse unifying structured and unstructured data to improve AI readiness and accelerate private-cloud AI agent development.
#data-lakehouse
fromDevOps.com
1 month ago
Data science

StarTree Bridges the Lakehouse Gap: Serving Apache Iceberg Data Directly to Applications - DevOps.com

fromDevOps.com
1 month ago
Data science

StarTree Bridges the Lakehouse Gap: Serving Apache Iceberg Data Directly to Applications - DevOps.com

Data science
fromLondon Business News | Londonlovesbusiness.com
2 weeks ago

Field service math for heavy equipment: How to prove ROI with the right metrics - London Business News | Londonlovesbusiness.com

Field service performance must be driven by validated metrics linking field execution to financial outcomes, focusing on first-time fixes, planned maintenance, and digital tooling.
#big-data
frommedium.com
3 weeks ago
Data science

Complete Guide to Learn Big Data

Learn big data end-to-end: fundamentals, programming, storage, batch/stream processing, ETL, cloud, ML, governance, and hands-on projects with runnable Airflow and PySpark Docker examples.
fromMedium
3 weeks ago
Data science

Why Your Big Data Architecture is Flawed

Data centrality and single-machine memory limits force adoption of new computational toolkits and scalable infrastructure to extract practical value from growing information streams.
Data science
fromBusiness Matters
4 weeks ago

Data-Driven Manager Decisions: From Time Reports to Team Growth

Analytics-driven time tracking transforms workforce management by providing real-time, AI-enhanced insights that optimize productivity, resource allocation, and organizational structure.
Data science
fromFast Company
3 weeks ago

What you can do about the government data that's disappearing

Federal government datasets are disappearing or being altered, undermining statistical trust and prompting archives and researchers to rescue and preserve public data.
Data science
fromTalkpython
3 weeks ago

Accelerating Python Data Science at NVIDIA

RAPIDS enables zero-code GPU acceleration for pandas, scikit-learn, NetworkX, and other Python data libraries, delivering large speedups and scalable GPU-native workflows.
#postgresql
fromHackernoon
2 months ago
Data science

How to Create a Foreign Data Wrapper in PostgreSQL and Aurora PostgreSQL on AWS RDS | HackerNoon

fromHackernoon
2 months ago
Data science

How to Create a Foreign Data Wrapper in PostgreSQL and Aurora PostgreSQL on AWS RDS | HackerNoon

Data science
fromMedium
1 month ago

Building Resilient Data Systems: Key Lessons from Veronika Durgin

Neglected data engineering tasks are crucial for stable and agile data pipelines.
fromHackernoon
4 months ago

Stationarity and Correlation Insights from VAR Modeling of Gas Base Fees | HackerNoon

The ADF test results confirm that both the gas base fee and blob gas base fee time series are stationary, with test statistics of -6.3719 and -10.5237.
Data science
Data science
fromDigiday
1 month ago

In AI and data, WPP Media revives a playbook it thinks it can finally win

WPP Media focuses on leveraging extensive data to differentiate itself in a competitive market.
#data-integration
fromHackernoon
2 months ago
Data science

Kishore's Leadership in STIBO MDM & Strategic AI Implementation at a Major Healthcare Organization | HackerNoon

fromHackernoon
1 year ago
Data science

A Developer's Guide to SeaTunnel and Hive Integration with Real-World Configs | HackerNoon

fromHackernoon
2 months ago
Data science

Kishore's Leadership in STIBO MDM & Strategic AI Implementation at a Major Healthcare Organization | HackerNoon

fromHackernoon
1 year ago
Data science

A Developer's Guide to SeaTunnel and Hive Integration with Real-World Configs | HackerNoon

fromTechzine Global
1 month ago

Snowflake launches Snowpark Connect to run Spark code natively

Snowpark Connect facilitates Apache Spark code execution directly within Snowflake warehouses, eliminating the need for separate Spark clusters and associated complexities like data movement.
Data science
#snowflake
Data science
fromHackernoon
2 years ago

How a Startup Using Gremlin Beat Everyone to Google's Door | HackerNoon

Google's acquisition of Wiz for $32 billion signifies a decisive victory in the cloud security sector.
Data science
fromIT Pro
1 month ago

Are geothermal data centers just hot air?

Geothermal energy is a reliable renewable source for powering large-scale data centers, particularly for high-density AI workloads.
Data science
fromTechzine Global
1 month ago

Scale Computing and Veeam now deliver full backup integration

Veeam's backup software integrates with Scale Computing's virtualization platform, enabling agentless hypervisor backup.
fromHackernoon
4 months ago

5 Major Business Mistakes When Working with Big Data: Lessons from a Company Managing 16 TB of Data | HackerNoon

Over a quarter of data and analytics professionals worldwide estimate that poor-quality data costs companies over $5 million annually, with 7% putting the figure at $25 million or more.
Data science
Data science
fromInfoWorld
1 month ago

Google updates agents in BigQuery to further automate analytics tasks

Google enhances BigQuery with a new code interpreter and advanced analytics features, improving automation in data engineering and data science tasks.
#data-centers
fromInfoWorld
1 month ago

Apache Flink integrates AI for real-time decision-making

With the 2.1 release, Apache Flink also now supports Process Table Functions (PTFs), the most powerful kind of function for Flink SQL and Table API.
Data science
Data science
fromMarTech
1 month ago

Messy data is your secret weapon - if you know how to use it | MarTech

Recent advances in AI enable effective analysis of messy, unstructured data, challenging the long-held belief that data must be clean.
#data-management
fromInfoQ
1 month ago
Data science

Building Reproducible ML Systems with Apache Iceberg and SparkSQL: Open Source Foundations

fromInfoQ
1 month ago
Data science

Building Reproducible ML Systems with Apache Iceberg and SparkSQL: Open Source Foundations

Data science
fromInfoWorld
1 year ago

What is Microsoft Fabric? A big tech stack for big data

Microsoft Fabric is a comprehensive cloud-based analytics suite integrating various Microsoft components for diverse roles.
Data science
fromMedium
2 months ago

Scaling AI Responsibly: Lessons in Efficiency, Flexibility, and Platform Design

AI tooling development must prioritize speed and user-centric solutions to drive real-world impact.
Data science
fromTechzine Global
1 month ago

Comeback of LTO tape: market grew significantly in 2024

LTO tape market experienced significant growth in 2024, with 176.5 exabytes of compressed capacity introduced, marking a 15.4% increase from 2023.
Data science
fromNew Relic
1 month ago

Database Performance Monitoring - Now GA: Deep Query Analysis

Enhanced Database Performance Monitoring enables direct query-level insights, improving DBAs' ability to manage database performance.
fromTechzine Global
1 month ago

Ataccama underlines AI data lineage for business users

Ataccama closes that gap by turning complex data logic into plain language. Business users can now trace a data point's origin and understand how it was profiled or flagged without relying on technical experts.
Data science
fromHackernoon
6 months ago

Redefining Data Operations With Data Flow Programming in CocoIndex | HackerNoon

In traditional systems, side effects lead to increased complexity, debugging challenges, and unpredictable behavior. CocoIndex adopts a pure data flow programming approach, ensuring reliability.
Data science
Data science
fromHackernoon
2 months ago

Effective Data Chunking and Querying with Pinecone and GPT-4o | HackerNoon

Optimizing data ingestion in Pinecone involves preprocessing markdown and splitting articles into fixed-length chunks for improved relevance.
Data science
fromInfoWorld
1 year ago

Snowflake updates developer tools, adds observability features

Snowflake introduces Trail for enhanced observability in data management workflows.
Data science
fromMedium
2 months ago

The Data Science Playbook: Exploring Sports Analytics Through Real Datasets

Data analytics has become central to competitive advantage in sports, influencing coaching, player evaluation, and fan experience.
Data science
fromHackernoon
2 years ago

Why No Single Algorithm Solves Deduplication - and What to Do Instead | HackerNoon

Detecting duplicate entities at scale requires efficient methods to reduce comparisons and maintain high recall.
fromInfoWorld
1 year ago

What's new in MySQL 9.0

MySQL 9.0.0 introduces a new Vector datatype, JavaScript Stored Programs, updated library versions, and enhancements to the Event Scheduler, while deprecating old SHA-1 security.
Data science
Data science
fromTearsheet
2 months ago

Announcing the winners of Tearsheet's 2025 Data Awards - Tearsheet

Data and data sharing are fundamental to modern finance, with ecosystems built around customer information.
fromTechCrunch
2 months ago

AI is forcing the data industry to consolidate - but that's not the whole story | TechCrunch

There is a complete reset in how data is managed and flows around the enterprise. If people want to seize the AI imperative, they have to redo their data platforms in a very big way. And this is where I believe you're seeing all these data acquisitions, because this is the foundation to have a sound AI strategy.
Data science
fromClickUp
2 months ago

Venn Diagram Alternatives for Data Visualization in 2025 | ClickUp

Venn diagrams use overlapping circles to show the relationship between two or more things, facilitating comparisons across various fields.
Data science
Data science
fromHackernoon
4 years ago

What If Your 'Messy' Data Is Actually Perfect? | HackerNoon

Success Metrics layer guides transformation by defining what success looks like and how to recognize achievement.
Data science
fromIT Pro
2 months ago

How can businesses handle data sprawl?

Data sprawl and content sprawl create significant challenges for organizations due to unstructured data growth and lack of governance.
Data science
fromHackernoon
2 years ago

Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics | HackerNoon

The MS MARCO dataset reveals considerable multilingual disparity and significant data skew, highlighting challenges in model evaluation and training.
fromHackernoon
2 months ago

How to Write Complex Queries in Apache Spark SQL Using CTE (WITH Clause) | HackerNoon

A Common Table Expression (CTE) is a named, temporary result set defined within a single SQL statement, which helps in improving query readability and maintainability.
Data science
Data science
fromESPN.com
2 months ago

NHL draft grades: From the excellent (Islanders, Hurricanes) to the confusing (Maple Leafs)

The 2025 NHL draft faced criticism for its lengthy process and decentralization voting, emphasizing a return to centralized drafting.
fromMedium
3 months ago

Frequent Spark Interview QuestionsPart 2

Both cache() and persist() store an RDD/DataFrame/Dataset in memory (or disk) to avoid recomputation. cache() is shorthand for persist(StorageLevel.MEMORY_ONLY), while persist() offers more control.
Data science
Data science
fromDevOps.com
2 months ago

DataOps and Automation: The Future of Database Management - DevOps.com

Implementing DataOps can significantly enhance deployment velocity by automating database operations, reducing errors and manual delays.
Data science
fromTheregister
2 months ago

A trip through vintage datacenter networking

The evolution of datacenter networking has transformed from proprietary systems to complex modern technologies.
Early networking was defined by compatibility issues and manufacturer-specific protocols.
Data science
fromInfoWorld
2 months ago

Teradata aims to simplify on-premises AI for data scientists with AI Factory

Teradata's AI Factory simplifies on-prem AI lifecycle management, reducing reliance on hybrid solutions and improving data sovereignty.
#apache-spark
fromMedium
2 months ago
Data science

RDD vs DataFrame vs Dataset in Apache Spark: Which One Should You Use and Why

Understanding Spark's APIs—RDD, DataFrame, and Dataset—saves time and boosts efficiency in big data processing.
fromMedium
2 months ago
Data science

Leveraging Broadcast Joins in Apache Spark (Scala)

Broadcast joins optimize Spark for faster dataset joins by broadcasting smaller datasets, avoiding costly shuffle operations.
fromMedium
2 months ago
Data science

RDD vs DataFrame vs Dataset in Apache Spark: Which One Should You Use and Why

fromwww.theguardian.com
2 months ago

Antarctic ice has grown again but this does not buck overall melt trend

Antarctic ice gained mass from 2021 to 2023, showing climate change follows a jagged path with temporary gains amid long-term losses.
Data science
[ Load more ]