Changes in version 0.3.1.9000 (Development version.) Changes in version 0.3.1 (2026-05-19) Performance - tidy_gower() — eliminated two layers of redundant work in the pairwise distance loop: - Column ranges (max - min) and ordinal rank vectors were previously recomputed on every (i, j) pair. They are now computed once in a pre-pass, reducing work from O(n² × p) to O(n² + p). - Replaced scalar data-frame indexing data[i, k] — which dispatches to the R-level [.data.frame method on every call — with pre-extracted plain-vector access col_vecs[[k]][i], which resolves at the C level. Benchmarks show 10–100× faster scalar access; the gain compounds across the full n*(n-1)/2 * p iterations. - Column types (is.numeric, is.ordered) are now resolved once into a col_type character vector, removing repeated S3 predicate calls from the inner loop. Bug Fixes - Fixed tl_reduce_dimensions() returning the internal .obs_id row identifier as a column of its $data result. Passing that data to a supervised model via a response ~ . formula fed .obs_id in as a high-cardinality predictor, which made tree-based fits effectively non-terminating. The identifier is now dropped from the returned data, consistent with how the pipeline and transfer-learning paths already handle it. - Fixed print() and summary() erroring on the model objects returned by tl_step_selection() and tl_tune_xgboost(). Both constructed their object without the spec$paradigm field or the tidylearn_supervised class, so the print method hit a zero-length if condition and summary() took the unsupervised branch. Both objects are now built consistently with tl_model(). - Fixed tidy_gower() (and tidy_dist(..., method = "gower")) erroring on single-row input. The pairwise loop used 1:(n - 1), which produces the invalid sequence 1:0 when n is 1; it now uses seq_len(n - 1), so a single-row data frame returns an empty dist object, consistent with stats::dist(). Tests - Added 11 tests for tidy_gower() / tidy_dist(..., method = "gower") covering: return type and metadata, symmetry and self-distance, identical rows, hand-verified numeric / categorical / ordered / mixed-type distances, NA skipping, custom weights, constant-column denominator behaviour, and single-row input. Internal - Removed seven unused packages from Suggests (caret, mclust, onnx, parsnip, recipes, reticulate, workflows) — none were referenced in package code, tests, or vignettes. Changes in version 0.3.0 (2026-04-09) New Features Data Ingestion (tl_read() Family) - New tl_read() dispatcher function — auto-detects format from file extension, URL pattern, or connection string and routes to the appropriate reader - All readers return a tidylearn_data object, a tibble subclass carrying source, format, and timestamp metadata via print.tidylearn_data() File Format Readers - tl_read_csv() / tl_read_tsv() — via readr with base R fallback - tl_read_excel() — .xls, .xlsx, .xlsm files via readxl - tl_read_parquet() — via nanoparquet - tl_read_json() — tabular JSON via jsonlite - tl_read_rds() / tl_read_rdata() — native R formats via base R Database Readers - tl_read_db() — query any live DBI connection - tl_read_sqlite() — auto-connect to SQLite files via RSQLite - tl_read_postgres() — connection string or named params via RPostgres - tl_read_mysql() — connection string or named params via RMariaDB - tl_read_bigquery() — Google BigQuery via bigrquery Cloud/API Readers - tl_read_s3() — download and read from S3 URIs via paws.storage - tl_read_github() — download raw files from GitHub repositories - tl_read_kaggle() — download datasets via the Kaggle CLI Multi-File Reading - tl_read() accepts a character vector of paths — reads each and row-binds with a source_file column - tl_read_dir() — scan a directory for data files with optional format, pattern, and recursive filtering - tl_read_zip() — extract and read from zip archives, with optional file selection - All backend packages are suggested dependencies, checked at call time via tl_check_packages() New Vignette - Added "Data Ingestion with tidylearn" vignette covering all readers, databases, cloud sources, multi-file reading, and the full pipeline - Updated "Getting Started" vignette to include tl_read() in the workflow Bug Fixes Workflow and Pipeline Fixes - Fixed tl_transfer_learning() hanging indefinitely when used with PCA pre-training. The .obs_id row-identifier column from PCA output was being included in the supervised formula, creating a massive dummy-variable matrix. The column is now stripped before both training and prediction. - Fixed tl_run_pipeline() failing with "attempt to select less than one element" when all cross-validation metrics were NA. Root cause: scale() returned matrix columns instead of vectors, causing downstream metric computation to produce NaN. Added as.vector() wrapper and hardened the best-model selection to handle all-NA metric values gracefully. - Overhauled tl_auto_ml() time budget enforcement. The budget now controls which models are attempted: budgets under 30s skip slow C-level models (forest, SVM, XGBoost) entirely, and cross-validation is skipped when remaining time is tight. Baseline model order changed to fast-first (tree, logistic/linear, then forest). See ?tl_auto_ml for full details on budget tiers. Interaction and Prediction Fixes - Fixed tl_interaction_effects() crashing with "unused argument (se.fit)" because tidylearn's predict() method does not support se.fit. Now uses stats::predict() on the raw model object for confidence intervals. Also fixed an invalid formula in the internal slope calculation. - Fixed tl_plot_interaction() expecting fit/lwr/upr columns from predict() output. Now correctly handles tidylearn's .pred tibble format. Visualization Fixes - Fixed tl_plot_intervals() calling non-existent tl_prediction_intervals() function. Now computes confidence and prediction intervals directly via stats::predict(..., interval = "confidence") and stats::predict(..., interval = "prediction"). - Fixed tl_plot_svm_boundary() erroring with "at least two predictor variables required" when using response ~ . formulas. The function now resolves predictors from data column names instead of all.vars(), which does not expand .. Also switched from geom_contour_filled (which failed on discrete class predictions) to geom_raster. - Fixed tl_plot_svm_tuning() passing NULL entries in the ranges list to e1071::tune(), which caused "NA/NaN/Inf in foreign function call" errors. Tuning ranges are now built conditionally based on the kernel type. - Fixed tl_plot_xgboost_shap_summary() failing with "arguments imply differing number of rows" when n_samples differed from nrow(data). Sampling is now performed before SHAP computation so that feature values and SHAP values always have the same number of rows. Other Fixes - Fixed classification auto-detection silently treating numeric responses with <= 10 unique values as classification. The response must now be a factor or character for classification; a helpful message is emitted when a low-cardinality numeric response is detected. - Fixed tl_check_assumptions() crashing with "list object cannot be coerced to logical" when some assumption checks returned NULL (e.g., when optional test packages were not installed). - Fixed SVM default gamma calculation to use predictor count only (1 / (ncol(data) - 1)) instead of including the response column. - Added missing @return tag to print.tidylearn_data(). - Replaced deprecated ggplot2 size parameter with linewidth in all geom_line() calls across visualization, classification, PCA, DBSCAN, and validation plotting functions. Tests - Added test suite for visualization module (26 tests) — plot dispatch, regression/classification plots, lift/gain charts, model comparison, unsupervised visualization, and Shiny dashboard. - Added test suite for tuning module (49 tests) — tl_default_param_grid, tl_tune_grid, tl_tune_random, tl_plot_tuning_results, and input validation. - Added test suite for diagnostics module (75 tests) — influence measures, influence plots, assumption checking, and outlier detection across all methods (IQR, z-score, Cook's, Mahalanobis). Code Quality - Package-wide lint cleanup — all R source files, tests, and vignettes now pass lintr with zero issues - Replaced unsafe 1:n patterns with seq_len() / seq_along() - Removed unused variables across the codebase - Renamed non-snake_case variables to follow R conventions - Added .lintr configuration enforcing %>% pipe consistency Changes in version 0.2.0 (2026-03-16) New Features Formatted gt Tables - New tl_table() dispatcher function — mirrors plot() but produces formatted gt tables instead of ggplot2 visualisations - tl_table_metrics() — styled evaluation metrics table from tl_evaluate() - tl_table_coefficients() — model coefficients with p-values (lm/glm) or sorted by magnitude (glmnet), with conditional highlighting - tl_table_confusion() — confusion matrix with correct predictions highlighted on the diagonal - tl_table_importance() — ranked feature importance with colour gradient - tl_table_variance() — PCA variance explained with cumulative % coloured - tl_table_loadings() — PCA loadings with diverging red–blue colour scale - tl_table_clusters() — cluster sizes and mean feature values for kmeans, pam, clara, dbscan, and hclust models - tl_table_comparison() — side-by-side multi-model comparison table - All table functions share a consistent gt theme via internal tl_gt_theme() helper - gt is a suggested dependency — functions error with an install message if gt is not available New Vignette - Added "Reporting with tidylearn" vignette covering all plot and table functions Bug Fixes - Fixed tl_fit_dbscan() returning a non-existent core_points field instead of summary from the underlying tidy_dbscan() result Changes in version 0.1.1 (2026-03-13) Bug Fixes - Fixed plot() failing on supervised models with "could not find function 'tl_plot_model'" by implementing the missing tl_plot_model() and tl_plot_unsupervised() internal dispatchers (#1) - Fixed tl_plot_actual_predicted(), tl_plot_residuals(), and tl_plot_confusion() failing due to accessing a non-existent $prediction column on predict output (correct column is $.pred) - Fixed the same $prediction column mismatch in the tl_dashboard() predictions table Changes in version 0.1.0 (2026-02-06) Initial CRAN Release - First release of tidylearn - a unified tidy interface to R's machine learning ecosystem Features Unified Interface - tl_model() - Single function to fit 20+ machine learning models - Consistent function signatures across all methods - Tidy tibble output for all results - Access raw model objects via $fit for package-specific functionality Supervised Learning Methods - Linear regression (stats::lm) - Polynomial regression (stats::lm with poly) - Logistic regression (stats::glm) - Ridge, LASSO, elastic net (glmnet) - Decision trees (rpart) - Random forests (randomForest) - Gradient boosting (gbm) - XGBoost (xgboost) - Support vector machines (e1071) - Neural networks (nnet) - Deep learning (keras, optional) Unsupervised Learning Methods - Principal Component Analysis (stats::prcomp) - Multidimensional Scaling (stats, MASS, smacof) - K-means clustering (stats::kmeans) - PAM clustering (cluster::pam) - CLARA clustering (cluster::clara) - Hierarchical clustering (stats::hclust) - DBSCAN (dbscan) Additional Features - tl_split() - Train/test splitting with stratification support - tl_prepare_data() - Data preprocessing (scaling, imputation, encoding) - tl_evaluate() - Model evaluation with multiple metrics - tl_auto_ml() - Automated machine learning - tl_tune() - Hyperparameter tuning with grid and random search - Unified ggplot2-based visualization functions - Integration workflows combining supervised and unsupervised learning Wrapped Packages tidylearn wraps established R packages including: stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart, cluster, dbscan, MASS, and smacof.