Sunday, June 8, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Posit AI Blog: News from the sparkly-verse

April 22, 2024
in AI Technology
Reading Time: 5 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Highlights

sparklyr and friends have been getting some important updates in the past few
months, here are some highlights:

spark_apply() now works on Databricks Connect v2

sparkxgb is coming back to life

Support for Spark 2.3 and below has ended

pysparklyr 0.1.4

spark_apply() now works on Databricks Connect v2. The latest pysparklyr
release uses the rpy2 Python library as the backbone of the integration.

Databricks Connect v2, is based on Spark Connect. At this time, it supports
Python user-defined functions (UDFs), but not R user-defined functions.
Using rpy2 circumvents this limitation. As shown in the diagram, sparklyr
sends the the R code to the locally installed rpy2, which in turn sends it
to Spark. Then the rpy2 installed in the remote Databricks cluster will run
the R code.


Figure 1: R code via rpy2

A big advantage of this approach, is that rpy2 supports Arrow. In fact it
is the recommended Python library to use when integrating Spark, Arrow and
R.
This means that the data exchange between the three environments will be much
faster!

As in its original implementation, schema inferring works, and as with the
original implementation, it has a performance cost. But unlike the original,
this implementation will return a ‘columns’ specification that you can use
for the next time you run the call.

spark_apply(
tbl_mtcars,
nrow,
group_by = “am”
)

#> To increase performance, use the following schema:
#> columns = “am double, x long”

#> # Source: table<`sparklyr_tmp_table_b84460ea_b1d3_471b_9cef_b13f339819b6`> [2 x 2]
#> # Database: spark_connection
#> am x
#> <dbl> <dbl>
#> 1 0 19
#> 2 1 13

A full article about this new capability is available here:
Run R inside Databricks Connect

sparkxgb

The sparkxgb is an extension of sparklyr. It enables integration with
XGBoost. The current CRAN release
does not support the latest versions of XGBoost. This limitation has recently
prompted a full refresh of sparkxgb. Here is a summary of the improvements,
which are currently in the development version of the package:

The xgboost_classifier() and xgboost_regressor() functions no longer
pass values of two arguments. These were deprecated by XGBoost and
cause an error if used. In the R function, the arguments will remain for
backwards compatibility, but will generate an informative error if not left NULL:

Updates the JVM version used during the Spark session. It now uses xgboost4j-spark
version 2.0.3,
instead of 0.8.1. This gives us access to XGboost’s most recent Spark code.

Updates code that used deprecated functions from upstream R dependencies. It
also stops using an un-maintained package as a dependency (forge). This
eliminated all of the warnings that were happening when fitting a model.

Major improvements to package testing. Unit tests were updated and expanded,
the way sparkxgb automatically starts and stops the Spark session for testing
was modernized, and the continuous integration tests were restored. This will
ensure the package’s health going forward.

remotes::install_github(“rstudio/sparkxgb”)

library(sparkxgb)
library(sparklyr)

sc <- spark_connect(master = “local”)
iris_tbl <- copy_to(sc, iris)

xgb_model <- xgboost_classifier(
iris_tbl,
Species ~ .,
num_class = 3,
num_round = 50,
max_depth = 4
)

xgb_model %>%
ml_predict(iris_tbl) %>
select(Species, predicted_label, starts_with(“probability_”)) %>
dplyr::glimpse()
#> Rows: ??
#> Columns: 5
#> Database: spark_connection
#> $ Species <chr> “setosa”, “setosa”, “setosa”, “setosa”, “setosa…
#> $ predicted_label <chr> “setosa”, “setosa”, “setosa”, “setosa”, “setosa…
#> $ probability_setosa <dbl> 0.9971547, 0.9948581, 0.9968392, 0.9968392, 0.9…
#> $ probability_versicolor <dbl> 0.002097376, 0.003301427, 0.002284616, 0.002284…
#> $ probability_virginica <dbl> 0.0007479066, 0.0018403779, 0.0008762418, 0.000…

sparklyr 1.8.5

The new version of sparklyr does not have user facing improvements. But
internally, it has crossed an important milestone. Support for Spark version 2.3
and below has effectively ended. The Scala
code needed to do so is no longer part of the package. As per Spark’s versioning
policy, found here,
Spark 2.3 was ‘end-of-life’ in 2018.

This is part of a larger, and ongoing effort to make the immense code-base of
sparklyr a little easier to maintain, and hence reduce the risk of failures.
As part of the same effort, the number of upstream packages that sparklyr
depends on have been reduced. This has been happening across multiple CRAN
releases, and in this latest release tibble, and rappdirs are no longer
imported by sparklyr.

Enjoy this blog? Get notified of new posts by email:

Posts also available at r-bloggers

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Citation

For attribution, please cite this work as

Ruiz (2024, April 22). Posit AI Blog: News from the sparkly-verse. Retrieved from https://blogs.rstudio.com/tensorflow/posts/2024-04-22-sparklyr-updates/

BibTeX citation

@misc{sparklyr-updates-q1-2024,
author = {Ruiz, Edgar},
title = {Posit AI Blog: News from the sparkly-verse},
url = {https://blogs.rstudio.com/tensorflow/posts/2024-04-22-sparklyr-updates/},
year = {2024}
}



Source link

Tags: BlogNewsPositsparklyverse
Previous Post

Five rockets fired from Iraq towards US military base in Syria, security sources say By Reuters

Next Post

Salesforce’s talks to acquire Informatica flame out – WSJ

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Salesforce’s talks to acquire Informatica flame out – WSJ

Salesforce's talks to acquire Informatica flame out - WSJ

Shrini Viswanath: Dons of Dalal Street: Upstox Co-founder Shrini Viswanath’s first bet was a US stock

Shrini Viswanath: Dons of Dalal Street: Upstox Co-founder Shrini Viswanath’s first bet was a US stock

Steps for a Successful Web3 Project Launch

Steps for a Successful Web3 Project Launch

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Graph neural networks in TensorFlow – Google Research Blog

Graph neural networks in TensorFlow – Google Research Blog

February 6, 2024
13 Best Books, Courses and Communities for Learning React — SitePoint

13 Best Books, Courses and Communities for Learning React — SitePoint

February 4, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In