(Development version.)
tidy_gower() — eliminated two layers of redundant work in the pairwise
distance loop:
max - min) and ordinal rank vectors were previously
recomputed on every (i, j) pair. They are now computed once in a
pre-pass, reducing work from O(n² × p) to O(n² + p).data[i, k] — which dispatches to
the R-level [.data.frame method on every call — with pre-extracted
plain-vector access col_vecs[[k]][i], which resolves at the C level.
Benchmarks show 10–100× faster scalar access; the gain compounds across
the full n*(n-1)/2 * p iterations.is.numeric, is.ordered) are now resolved once into a
col_type character vector, removing repeated S3 predicate calls from
the inner loop.tl_reduce_dimensions() returning the internal .obs_id row
identifier as a column of its $data result. Passing that data to a
supervised model via a response ~ . formula fed .obs_id in as a
high-cardinality predictor, which made tree-based fits effectively
non-terminating. The identifier is now dropped from the returned data,
consistent with how the pipeline and transfer-learning paths already
handle it.print() and summary() erroring on the model objects returned
by tl_step_selection() and tl_tune_xgboost(). Both constructed their
object without the spec$paradigm field or the tidylearn_supervised
class, so the print method hit a zero-length if condition and
summary() took the unsupervised branch. Both objects are now built
consistently with tl_model().tidy_gower() (and tidy_dist(..., method = "gower")) erroring on
single-row input. The pairwise loop used 1:(n - 1), which produces the
invalid sequence 1:0 when n is 1; it now uses seq_len(n - 1), so a
single-row data frame returns an empty dist object, consistent with
stats::dist().tidy_gower() / tidy_dist(..., method = "gower")
covering: return type and metadata, symmetry and self-distance, identical
rows, hand-verified numeric / categorical / ordered / mixed-type distances,
NA skipping, custom weights, constant-column denominator behaviour, and
single-row input.Suggests (caret, mclust, onnx,
parsnip, recipes, reticulate, workflows) — none were referenced in
package code, tests, or vignettes.tl_read() Family)tl_read() dispatcher function — auto-detects format from file
extension, URL pattern, or connection string and routes to the appropriate
readertidylearn_data object, a tibble subclass carrying
source, format, and timestamp metadata via print.tidylearn_data()tl_read_csv() / tl_read_tsv() — via readr with base R fallbacktl_read_excel() — .xls, .xlsx, .xlsm files via readxltl_read_parquet() — via nanoparquettl_read_json() — tabular JSON via jsonlitetl_read_rds() / tl_read_rdata() — native R formats via base Rtl_read_db() — query any live DBI connectiontl_read_sqlite() — auto-connect to SQLite files via RSQLitetl_read_postgres() — connection string or named params via RPostgrestl_read_mysql() — connection string or named params via RMariaDBtl_read_bigquery() — Google BigQuery via bigrquerytl_read_s3() — download and read from S3 URIs via paws.storagetl_read_github() — download raw files from GitHub repositoriestl_read_kaggle() — download datasets via the Kaggle CLItl_read() accepts a character vector of paths — reads each and row-binds
with a source_file columntl_read_dir() — scan a directory for data files with optional format,
pattern, and recursive filteringtl_read_zip() — extract and read from zip archives, with optional file
selectiontl_check_packages()tl_read() in the workflowtl_transfer_learning() hanging indefinitely when used with PCA
pre-training. The .obs_id row-identifier column from PCA output was
being included in the supervised formula, creating a massive dummy-variable
matrix. The column is now stripped before both training and prediction.tl_run_pipeline() failing with "attempt to select less than one
element" when all cross-validation metrics were NA. Root cause: scale()
returned matrix columns instead of vectors, causing downstream metric
computation to produce NaN. Added as.vector() wrapper and hardened the
best-model selection to handle all-NA metric values gracefully.tl_auto_ml() time budget enforcement. The budget now controls
which models are attempted: budgets under 30s skip slow C-level models
(forest, SVM, XGBoost) entirely, and cross-validation is skipped when
remaining time is tight. Baseline model order changed to fast-first
(tree, logistic/linear, then forest). See ?tl_auto_ml for full details
on budget tiers.tl_interaction_effects() crashing with "unused argument (se.fit)"
because tidylearn's predict() method does not support se.fit. Now uses
stats::predict() on the raw model object for confidence intervals. Also
fixed an invalid formula in the internal slope calculation.tl_plot_interaction() expecting fit/lwr/upr columns from
predict() output. Now correctly handles tidylearn's .pred tibble
format.tl_plot_intervals() calling non-existent tl_prediction_intervals()
function. Now computes confidence and prediction intervals directly via
stats::predict(..., interval = "confidence") and
stats::predict(..., interval = "prediction").tl_plot_svm_boundary() erroring with "at least two predictor
variables required" when using response ~ . formulas. The function now
resolves predictors from data column names instead of all.vars(), which
does not expand .. Also switched from geom_contour_filled (which
failed on discrete class predictions) to geom_raster.tl_plot_svm_tuning() passing NULL entries in the ranges list
to e1071::tune(), which caused "NA/NaN/Inf in foreign function call"
errors. Tuning ranges are now built conditionally based on the kernel type.tl_plot_xgboost_shap_summary() failing with "arguments imply
differing number of rows" when n_samples differed from nrow(data).
Sampling is now performed before SHAP computation so that feature values
and SHAP values always have the same number of rows.tl_check_assumptions() crashing with "list object cannot be
coerced to logical" when some assumption checks returned NULL (e.g.,
when optional test packages were not installed).gamma calculation to use predictor count only
(1 / (ncol(data) - 1)) instead of including the response column.@return tag to print.tidylearn_data().size parameter with linewidth in all
geom_line() calls across visualization, classification, PCA, DBSCAN,
and validation plotting functions.tl_default_param_grid,
tl_tune_grid, tl_tune_random, tl_plot_tuning_results, and input
validation.1:n patterns with seq_len() / seq_along().lintr configuration enforcing %>% pipe consistencytl_table() dispatcher function — mirrors plot() but produces
formatted gt tables instead of ggplot2 visualisationstl_table_metrics() — styled evaluation metrics table from tl_evaluate()tl_table_coefficients() — model coefficients with p-values (lm/glm) or
sorted by magnitude (glmnet), with conditional highlightingtl_table_confusion() — confusion matrix with correct predictions
highlighted on the diagonaltl_table_importance() — ranked feature importance with colour gradienttl_table_variance() — PCA variance explained with cumulative % colouredtl_table_loadings() — PCA loadings with diverging red–blue colour scaletl_table_clusters() — cluster sizes and mean feature values for kmeans,
pam, clara, dbscan, and hclust modelstl_table_comparison() — side-by-side multi-model comparison tablegt theme via internal
tl_gt_theme() helpergt is a suggested dependency — functions error with an install message if
gt is not availabletl_fit_dbscan() returning a non-existent core_points field
instead of summary from the underlying tidy_dbscan() resultplot() failing on supervised models with
"could not find function 'tl_plot_model'" by implementing the missing
tl_plot_model() and tl_plot_unsupervised() internal dispatchers
(#1)tl_plot_actual_predicted(), tl_plot_residuals(), and
tl_plot_confusion() failing due to accessing a non-existent $prediction
column on predict output (correct column is $.pred)$prediction column mismatch in the tl_dashboard()
predictions tabletl_model() - Single function to fit 20+ machine learning models$fit for package-specific functionalitytl_split() - Train/test splitting with stratification supporttl_prepare_data() - Data preprocessing (scaling, imputation, encoding)tl_evaluate() - Model evaluation with multiple metricstl_auto_ml() - Automated machine learningtl_tune() - Hyperparameter tuning with grid and random searchtidylearn wraps established R packages including: stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart, cluster, dbscan, MASS, and smacof.