splitting-datasets

Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose. allowed-tools: Read, Write, Edit, Grep, Glob, Bash(cmd:*) version: 1.0.0 author: Jeremy Longshore <jeremy@intentsolutions.io> license: MIT

v1.0.0

Claude Code Plugins

MIT

Allowed Tools

No tools specified

Provided by Plugin

dataset-splitter

Split datasets for training, validation, and testing

ai ml v1.0.0

View Plugin

Installation

This skill is included in the dataset-splitter plugin:

/plugin install dataset-splitter@claude-code-plugins-plus

Click to copy

Instructions

# Dataset Splitter This skill provides automated assistance for dataset splitter tasks. ## Overview This skill automates the process of dividing a dataset into subsets for training, validating, and testing machine learning models. It ensures proper data preparation and facilitates robust model evaluation. ## How It Works 1. **Analyze Request**: The skill analyzes the user's request to determine the dataset to be split and the desired proportions for each subset. 2. **Generate Code**: Based on the request, the skill generates Python code utilizing standard ML libraries to perform the data splitting. 3. **Execute Splitting**: The code is executed to split the dataset into training, validation, and testing sets according to the specified ratios. ## When to Use This Skill This skill activates when you need to: - Prepare a dataset for machine learning model training. - Create training, validation, and testing sets. - Partition data to evaluate model performance. ## Examples ### Example 1: Splitting a CSV file User request: "Split the data in 'my_data.csv' into 70% training, 15% validation, and 15% testing sets." The skill will: 1. Generate Python code to read the 'my_data.csv' file. 2. Execute the code to split the data according to the specified proportions, creating 'train.csv', 'validation.csv', and 'test.csv' files. ### Example 2: Creating a Train-Test Split User request: "Create a train-test split of 'large_dataset.csv' with an 80/20 ratio." The skill will: 1. Generate Python code to load 'large_dataset.csv'. 2. Execute the code to split the dataset into 80% training and 20% testing sets, saving them as 'train.csv' and 'test.csv'. ## Best Practices - **Data Integrity**: Verify that the splitting process maintains the integrity of the data, ensuring no data loss or corruption. - **Stratification**: Consider stratification when splitting imbalanced datasets to maintain class distributions in each subset. - **Randomization**: Ensure the splitting process is randomized to avoid bias in the resulting datasets. ## Integration This skill can be integrated with other data processing and model training tools within the Claude Code ecosystem to create a complete machine learning workflow. ## Prerequisites - Appropriate file access permissions - Required dependencies installed ## Instructions 1. Invoke this skill when the trigger conditions are met 2. Provide necessary context and parameters 3. Review the generated output 4. Apply modifications as needed ## Output The skill produces structured output relevant to the task. ## Error Handling - Invalid input: Prompts for correction - Missing dependencies: Lists required components - Permission errors: Suggests remediation steps ## Resources - Project documentation - Related skills and commands