60 Best ChatGPT Prompts for Data Analysis

Looking for the right ChatGPT prompts for data analysis? This collection of 60 prompts helps you clean messy data, spot trends, and turn raw numbers into clear, decision-ready insights, fast.

Full exploratory data analysis (EDA)

Exploration
prompt template
Act as a senior data analyst. I have a dataset with the following columns: [list your columns and data types]. The dataset contains [X rows] and covers [time period or scope]. Perform a comprehensive EDA that includes: A summary of the data (shape, types, missing values, duplicates) Descriptive statistics for all numeric columns (mean, median, std, min, max, quartiles) Distribution analysis — identify skewed or bimodal variables Correlation matrix — highlight the top 5 strongest correlations with [target variable] Outlier detection using IQR and z-score methods Key patterns, anomalies, or surprising findings A prioritized list of 3–5 next analytical steps you recommend Use Python (pandas, seaborn, matplotlib). Provide clean, commented code and a plain-English summary after each section.

Intelligent data cleaning pipeline

Data Cleaning
prompt template
You are a data engineering expert. I have a messy dataset in [CSV / Excel / SQL table] format with these known quality issues: Missing values in columns: [list columns] Suspected duplicates in: [column or row identifiers] Inconsistent formats in: [e.g., date columns, categorical labels] Possible outliers in: [numeric columns] Build a full data cleaning pipeline in Python that: Detects and reports missing values with percentage per column Imputes missing values using the most appropriate strategy per column and explains why Removes or flags exact and fuzzy duplicates Standardizes date formats to YYYY-MM-DD Normalizes categorical columns (lowercase, strip whitespace, map variants to canonical labels) Caps or removes outliers based on IQR or z-score, keeping a change log Exports the cleaned dataset and a cleaning summary report Flag any decisions that require my judgment before applying them.

Hypothesis testing with interpretation

Statistics
prompt template
Act as a statistician. I want to test whether [specific hypothesis, e.g., "customers who receive email campaigns convert at a higher rate than those who don't"]. Dataset context: Group A (control): [N = X, conversion rate or mean = Y] Group B (treatment): [N = X, conversion rate or mean = Y] Metric type: [binary / continuous / ordinal] Do the following: Recommend the correct statistical test (t-test, chi-square, Mann-Whitney U, ANOVA, etc.) and justify the choice State the null and alternative hypotheses clearly Check all test assumptions with code Run the test and return: test statistic, p-value, confidence interval, and effect size Interpret the result in plain business language Warn me of any limitations or risks of the conclusion Use Python (scipy, statsmodels). Include visualizations of the distributions.

Strategic data visualization plan

Visualization
prompt template
You are a data visualization expert. I need to present findings from a [type of analysis] to [audience: executive team / technical stakeholders / clients]. My dataset has: [describe key columns and metrics] My main message is: [the single most important finding] For each of 5 chart types, tell me: What question it answers The exact chart type and why Which columns go on which axis What color encoding or annotation would add insight Common mistakes to avoid Then provide Python code (matplotlib or plotly) for the 2 most impactful charts, with titles, axis labels, and a one-sentence insight annotation on the chart.

Predictive model build and evaluation

ML & Prediction
prompt template
Act as a machine learning engineer. I want to predict [target variable, e.g., customer churn / revenue / equipment failure]. Dataset: Features: [list feature names and types] Target: [name, type: binary / multiclass / continuous] Size: [rows x columns] Class imbalance (if classification): [e.g., 90% / 10%] Build a complete ML pipeline that: Splits data into train (70%), validation (15%), and test (15%) sets with stratification Preprocesses features in-pipeline Trains and compares at least 3 models Tunes the best model using cross-validated grid search Evaluates on the test set with all relevant metrics and plots Explains top 10 features using SHAP values Saves the final pipeline as a .pkl file

SQL query optimization for large datasets

SQL & Databases
prompt template
Act as a senior database engineer. I have a SQL query that is running slowly on a table with [X million rows]. Here is the current query: [paste query]. Analyze and optimize it by: Identifying all performance bottlenecks (full table scans, missing indexes, unnecessary joins) Rewriting the query with optimizations explained line by line Recommending indexes to create and explaining the trade-offs Estimating the expected performance improvement Suggesting query execution plan analysis steps (EXPLAIN / EXPLAIN ANALYZE) Database: [PostgreSQL / MySQL / BigQuery / Snowflake]. Include before and after versions.

Time series forecasting

Forecasting
prompt template
Act as a time series analyst. I have [X months / years] of [metric, e.g., weekly sales, daily active users] data with columns: [date column, value column, any other relevant columns]. Build a forecasting pipeline that: Decomposes the series into trend, seasonality, and residual components Tests for stationarity (ADF test) and applies differencing if needed Identifies ARIMA parameters using ACF and PACF plots Trains and compares: ARIMA, SARIMA, and Prophet Evaluates models using RMSE, MAE, and MAPE on a held-out test set Generates a [X weeks / months] forecast with 80% and 95% confidence intervals Plots actuals vs. forecast with confidence bands Provide Python code and flag any seasonality patterns or anomalies found.

Customer segmentation analysis

Segmentation
prompt template
You are a customer analytics expert. I have a dataset of [X customers] with the following behavioral and demographic features: [list features]. Perform a full customer segmentation that: Selects the most relevant features and explains the selection rationale Scales and preprocesses the data appropriately Determines the optimal number of clusters using the elbow method and silhouette score Runs K-Means and DBSCAN and compares results Profiles each segment: size, key characteristics, average metrics Visualizes clusters using PCA or UMAP for 2D representation Recommends a business action for each segment (e.g., upsell, retain, reactivate) Use Python (scikit-learn, plotly). Provide both code and a plain-English segment summary.

Cohort retention analysis

Product Analytics
prompt template
Act as a product analyst. I have event-level data with columns: [user_id, event_date, event_type, and any other relevant fields]. I want to measure user retention by cohort. Build a cohort analysis that: Defines cohorts by [first purchase date / signup month / first login] Calculates monthly retention rates for each cohort up to [X months] Generates a cohort retention heatmap (cohorts as rows, months as columns) Identifies which cohorts have the strongest and weakest retention Calculates average Day 1, Day 7, and Day 30 retention across all cohorts Flags any cohort that drops below [X%] retention threshold and suggests possible causes Recommends 2–3 product or lifecycle actions based on the retention pattern Use Python (pandas, seaborn). Output the heatmap and a written commentary.

A/B test results analysis

Experimentation
prompt template
Act as an experimentation analyst. I ran an A/B test with the following setup: Experiment name: [name] Control group (A): [N = X users, metric = Y] Treatment group (B): [N = X users, metric = Y] Primary metric: [conversion rate / revenue per user / click-through rate] Secondary metrics: [list any] Test duration: [X days] Analyze the results by: Checking for sample ratio mismatch (SRM) Running the appropriate significance test with correction for multiple metrics Calculating p-value, confidence interval, and minimum detectable effect Checking for novelty effect or time-based degradation Assessing practical significance — is the lift worth shipping? Recommending a clear ship / no-ship / run longer decision with justification Include Python code and a one-page summary suitable for a product review meeting.

Funnel drop-off analysis

Product Analytics
prompt template
You are a conversion optimization analyst. I have funnel data tracking users across [X steps, e.g., Landing Page > Sign Up > Onboarding > First Purchase]. The dataset has columns: [user_id, step_name, timestamp, any segment fields]. Analyze funnel drop-off by: Calculating conversion rate at each step overall Breaking down conversion by [segment: device / channel / region / user type] Identifying the single biggest drop-off point and quantifying the revenue or user impact Detecting if drop-off has worsened or improved over [time period] Running statistical significance tests on segment-level differences Building a Sankey diagram or funnel chart visualization Recommending 3 specific, prioritized experiments to improve the worst-performing step Use Python or SQL. Provide code and a prioritized action plan.

Churn prediction model

ML & Prediction
prompt template
Act as an ML engineer specializing in retention. I want to build a churn prediction model for [SaaS / e-commerce / subscription] customers. Dataset: Rows: one per customer Features: [list behavioral, usage, demographic, and billing features] Label: [churned = 1, active = 0] Imbalance: [X% churned] Prediction horizon: [e.g., predict churn in next 30 days] Build a pipeline that: Engineers relevant features (recency, frequency, tenure, usage trends) Handles class imbalance using SMOTE or class weighting Trains Logistic Regression, Random Forest, and XGBoost with cross-validation Selects the threshold that maximizes F1 or recall based on business cost of false negatives Outputs a scored customer list ranked by churn probability Explains drivers for the top 10 highest-risk customers using SHAP force plots Include code, evaluation metrics, and a decision memo for the retention team.

Revenue attribution analysis

Business Analytics
prompt template
Act as a revenue analyst. I need to attribute [total revenue / MRR / ARR] across [channels / products / regions / sales reps] for [time period]. Dataset columns: [list relevant fields] Perform attribution analysis that: Calculates total and per-unit revenue by each dimension Identifies the top 20% of drivers contributing 80% of revenue (Pareto analysis) Computes month-over-month and year-over-year growth rates per segment Detects any underperforming segments relative to target or prior period Builds a contribution waterfall chart showing what drove overall revenue change Flags anomalies (sudden spikes or drops) with a possible explanation Provides a one-paragraph executive summary suitable for a board update Use Python or SQL. Output charts and a narrative summary.

Anomaly detection in operational data

Anomaly Detection
prompt template
You are a data scientist specializing in anomaly detection. I have [time series / transactional / sensor] data with columns: [list fields]. I need to identify unusual patterns that may indicate [fraud / equipment failure / data pipeline errors / sudden business changes]. Build a detection system that: Applies statistical control charts (mean ± 2 or 3 sigma) as a baseline Implements Isolation Forest and LOF for multivariate anomaly detection Flags anomalies with a severity score (low / medium / high) based on deviation magnitude Plots anomalies in context on a time series chart with highlighted points Groups anomalies by type or likely root cause where possible Recommends an alerting threshold strategy for production use Outputs an anomaly log with timestamp, affected column, deviation value, and severity Use Python (scikit-learn, pyod). Provide code and a summary of the most critical anomalies found.

Data pipeline audit and quality report

Data Quality
prompt template
Act as a data quality engineer. I need to audit the data flowing through [pipeline name or description] and generate a quality report for stakeholders. The pipeline processes [X records per day] from [source system] to [destination system / data warehouse]. Known or suspected issues include: [describe any known problems]. Perform an audit that: Profiles each key table or dataset (row counts, column completeness, cardinality) Detects schema drift between source and destination Validates business rules: [list 3–5 rules, e.g., "no negative revenue", "user_id must be unique"] Measures freshness: time since last update per table Identifies referential integrity violations across joined tables Scores overall data quality on a 1–10 scale with breakdown by dimension Generates a stakeholder-ready report with a priority-ranked issue list Use Python (Great Expectations or pandas). Provide code and a formatted report template.

Market basket analysis

Retail & E-commerce Analytics
prompt template
You are a retail data analyst. I have transaction data with columns: [transaction_id, customer_id, product_id, product_name, quantity, date]. I want to find product associations to improve cross-sell and bundling strategies. Perform market basket analysis that: Transforms data into a basket format (one row per transaction, columns per product) Runs the Apriori algorithm with minimum support: [X%] and confidence: [X%] Filters and ranks association rules by lift, confidence, and support Identifies the top 10 product pairs or triplets with the strongest lift Visualizes the association network as a graph Translates findings into 3 specific bundling or upsell recommendations Estimates potential revenue uplift if the top recommendation is implemented Use Python (mlxtend). Include code and a business-ready summary table.

Price sensitivity and elasticity analysis

Pricing Analytics
prompt template
Act as a pricing analyst. I have historical data on [product / service] sales including: [price, quantity sold, date, segment or region, and any promotion flags]. Analyze price sensitivity and elasticity by: Plotting demand curve (price vs. quantity) for each key segment Calculating price elasticity of demand (PED) with confidence intervals Identifying the price point that maximizes revenue vs. volume Segmenting customers by price sensitivity (elastic vs. inelastic buyers) Testing whether promotions have a lasting or temporary demand effect Building a simple price optimization model using regression Recommending 2–3 pricing strategy changes with projected impact Use Python (statsmodels, scipy). Provide code, charts, and a pricing recommendation memo.

Employee attrition analysis

HR Analytics
prompt template
You are an HR analytics expert. I have employee data with columns: [employee_id, department, role, tenure, salary, performance_rating, attrition (yes/no), engagement_score, manager_id, and others]. Analyze attrition by: Calculating overall and department-level attrition rates Identifying which roles, tenure bands, and pay grades have the highest attrition Running a logistic regression to find the top 5 drivers of attrition Building a survival analysis to estimate expected tenure by employee segment Flagging high-risk current employees based on the model (top 10%) Comparing attrition drivers vs. industry benchmarks where possible Recommending 3 targeted retention interventions with estimated cost vs. impact Use Python. Provide code, charts, and a summary report for the CHRO.

Geospatial data analysis

Geospatial Analytics
prompt template
Act as a geospatial data analyst. I have a dataset with [X records] containing location data (latitude, longitude or address) along with: [other relevant fields like sales, incidents, customers, assets]. Perform geospatial analysis that: Geocodes addresses to coordinates if needed Plots all records on an interactive map with color-coded markers by [metric or category] Performs clustering to identify geographic hotspots using DBSCAN or KMeans Calculates distance-based metrics (e.g., nearest store, service area coverage) Builds a choropleth map by [region / zip code / city] for [key metric] Identifies underserved or overserved geographic areas Recommends 2–3 location-based strategic actions based on the findings Use Python (geopandas, folium, plotly). Provide code and an interactive HTML map output.

Natural language processing on survey data

NLP & Text Analytics
prompt template
You are an NLP analyst. I have [X responses] from a customer or employee survey. The open-ended question was: "[paste question text]". Responses are in column [column name]. Analyze the text data by: Cleaning the text (lowercasing, stopword removal, lemmatization) Running sentiment analysis on each response (positive / negative / neutral) with confidence score Extracting the top 20 keywords and bigrams using TF-IDF Identifying 5–8 themes using topic modeling (LDA or BERTopic) Grouping responses by sentiment x theme matrix Flagging the most critical negative responses for immediate review Generating a one-page executive summary of what respondents care most about Use Python (NLTK, transformers, gensim). Provide code and the summary output.

Financial forecasting and variance analysis

Finance Analytics
prompt template
Act as a financial analyst. I have [monthly / quarterly] actuals vs. budget data for [business unit / product / cost center] covering [time period]. Columns include: [revenue, COGS, gross margin, opex line items, net income, and period]. Perform a full variance analysis that: Calculates actual vs. budget variance in absolute and percentage terms for each line item Identifies the top 3 drivers of favorable and unfavorable variance Builds a bridge (waterfall) chart showing how each variance contributed to the net income gap Runs a rolling 12-month forecast using actuals trend extrapolation Flags any line items where variance exceeds [X%] threshold for escalation Performs scenario analysis: best case, base case, worst case for next [X quarters] Produces a CFO-ready summary slide narrative with headline numbers and commentary Use Python or Excel. Provide code or formulas and the narrative template.

Root cause analysis with data

Diagnostics
prompt template
Act as an analytical problem-solver. A key metric ([metric name, e.g., conversion rate / NPS / order volume]) dropped by [X%] between [date A] and [date B]. I need to identify the root cause. Available data: [describe datasets available — e.g., web analytics, transaction logs, CRM data, support tickets]. Perform a data-driven root cause analysis by: Confirming the metric drop is real (not a tracking error or data pipeline issue) Breaking the metric down by all available dimensions (segment, region, channel, product) Isolating which dimension(s) account for most of the drop using a contribution analysis Analyzing whether the issue started suddenly or gradually Cross-referencing with events log (releases, campaigns, external events) for that period Forming and testing the top 3 hypotheses with statistical evidence Recommending a corrective action for the most likely root cause with an implementation timeline Provide a structured RCA report with supporting charts and evidence.

Dashboard design and KPI framework

Reporting & BI
prompt template
You are a BI analyst and dashboard designer. I need to build an executive dashboard for [business function: marketing / sales / operations / finance] tracking performance for [audience: C-suite / department head / team lead]. The business goal is: [describe the decision the dashboard should support]. Available data sources: [list systems, e.g., Salesforce, Google Analytics, ERP]. Key metrics I track today: [list current metrics]. Design the dashboard by: Recommending a KPI hierarchy (north star metric, secondary metrics, diagnostic metrics) Mapping each metric to a specific business question it answers Specifying the best visualization type for each metric and why Designing the layout: which metrics appear on top, how to group related KPIs Defining alert thresholds for each KPI Recommending refresh frequency (real-time / daily / weekly) per metric Providing a mockup description or wireframe in text format ready for a BI tool like Tableau, Power BI, or Looker Include a one-paragraph rationale for the overall design decisions.

Data-driven marketing attribution

Marketing Analytics
prompt template
Act as a marketing data analyst. I need to attribute [revenue / conversions / sign-ups] across the following channels: [email, paid search, organic, social, direct, referral]. I have user-level touchpoint data with columns: [user_id, channel, touchpoint_date, conversion_flag, conversion_value]. Build a multi-touch attribution analysis that: Compares four models: first touch, last touch, linear, and time decay Calculates attributed revenue and conversion count per channel under each model Visualizes how attribution share shifts across models for each channel Identifies which channels are undervalued or overvalued by last-touch attribution Recommends a data-driven attribution model using Shapley values Estimates the incremental ROAS (return on ad spend) per channel Produces a channel investment reallocation recommendation based on findings Use Python. Provide code and a channel performance scorecard.

Supply chain analytics and demand forecasting

Operations Analytics
prompt template
You are a supply chain analyst. I have [X months] of historical demand data with columns: [product_id, product_name, date, units_sold, units_returned, inventory_level, lead_time_days, stockout_flag]. Perform a supply chain analysis that: Calculates demand variability (CV) per product to classify into fast / slow / erratic movers Builds a demand forecast for each product for the next [X weeks / months] using appropriate model (moving average, exponential smoothing, or ML-based) Calculates safety stock levels using service level target of [X%] Identifies products at risk of stockout in the next [X days] given current inventory Flags slow-moving or dead stock exceeding [X days] of inventory cover Recommends reorder points and order quantities per SKU Estimates the working capital impact of the recommended inventory policy vs. current policy Use Python. Provide code and a prioritized reorder action list.

Regression analysis for business drivers

Statistics
prompt template
Act as a quantitative analyst. I want to understand what drives [outcome variable, e.g., customer lifetime value / monthly revenue / support ticket volume] using regression analysis. Dataset: [X rows], columns include: [list predictor variables and the outcome variable]. Run a full regression analysis that: Checks assumptions: linearity, normality of residuals, homoscedasticity, multicollinearity (VIF) Runs OLS regression and interprets each coefficient in plain business terms Identifies and removes or handles outliers and high-leverage points Tests for interaction effects between [specific variable pairs] Uses stepwise or LASSO feature selection to find the minimal predictive model Reports R-squared, adjusted R-squared, F-statistic, and AIC/BIC Produces a coefficient plot with confidence intervals and a plain-English interpretation of each significant predictor Use Python (statsmodels, sklearn). Provide code and a business-friendly results table.

Sentiment analysis on product reviews

NLP & Text Analytics
prompt template
You are an NLP specialist. I have [X product reviews] for [product name] from [platform, e.g., Amazon, G2, App Store]. The dataset has columns: [review_text, rating, date, verified_purchase, and any other metadata]. Analyze reviews by: Classifying sentiment at review level (positive / negative / neutral) Extracting aspect-level sentiment: identify how customers feel about [price, quality, delivery, support, UX, etc.] separately Tracking sentiment trend over [time period] — is sentiment improving or declining? Identifying the top 10 most common complaints and the top 10 most praised features Flagging reviews with high impact (verified + long + negative) for product team review Comparing sentiment distribution across rating levels (do 3-star reviews skew more negative on which aspects?) Generating a product feedback summary memo with prioritized action items for the product team Use Python (transformers / VADER / spaCy). Provide code and the final memo.

Network analysis for business relationships

Advanced Analytics
prompt template
Act as a network analyst. I have relational data representing [business relationships: customer referrals / supplier connections / employee collaboration / transaction networks]. The dataset has: [node_1, node_2, relationship_type, weight or frequency, date]. Perform a network analysis that: Builds a directed or undirected graph from the data Calculates centrality measures: degree, betweenness, closeness, and PageRank for each node Identifies the top 10 most influential nodes and explains what their centrality means in business terms Detects communities or clusters within the network Finds any critical bridge nodes whose removal would fragment the network Visualizes the network with node size proportional to influence and color by community Recommends 2–3 strategic actions based on the network structure (e.g., key account prioritization, risk nodes, partnership opportunities) Use Python (networkx, plotly). Provide code and a strategic summary.

Lifetime value (LTV) modeling

Customer Analytics
prompt template
Act as a customer analytics expert. I need to calculate and model customer lifetime value (LTV) for [business type, e.g., SaaS, e-commerce, subscription box]. Dataset: [columns including customer_id, acquisition_date, orders or payments with amounts, churn_date if applicable, acquisition_channel, plan or product type]. Build a full LTV model that: Calculates historical LTV for each customer (sum of gross margin over lifetime) Segments customers into LTV tiers: top 10%, mid 40%, bottom 50% Builds a predictive LTV model using BG/NBD + Gamma-Gamma (for non-contractual) or regression (for contractual) Compares predicted LTV by acquisition channel and cohort Calculates payback period per channel: LTV vs. CAC Identifies the customer attributes most correlated with high LTV Recommends acquisition and retention strategies for each LTV segment Use Python (lifetimes library or custom). Provide code, a LTV distribution chart, and a strategic summary.

Data storytelling and presentation builder

Communication & Reporting
prompt template
Act as a data storytelling expert. I have the following analysis results to present: [paste your key findings, charts, or tables]. My audience is: [e.g., CEO, VP of Marketing, board of directors] Their key question is: [e.g., "Should we expand into the EMEA market?" / "Is our growth sustainable?"] Decision to be made: [describe the decision] Build a data-driven narrative that: Opens with a single headline insight that directly answers the audience's key question Structures the story as: situation → complication → resolution (SCR framework) Selects the 3 most compelling data points and writes a plain-English sentence for each Recommends which charts to include and where in the narrative flow Writes transitions between sections that maintain analytical logic Anticipates the top 3 questions the audience will ask and provides pre-drafted answers with supporting data Ends with a clear call to action tied to the business decision Produce the full narrative script and a slide-by-slide outline.

Inventory optimization analysis

Operations Analytics
prompt template
You are an operations research analyst. I have inventory and sales data with columns: [product_id, sku, current_stock, average_daily_demand, demand_std_dev, lead_time_days, holding_cost_per_unit, stockout_cost_per_unit, order_cost]. Optimize my inventory policy by: Calculating EOQ (Economic Order Quantity) for each SKU Computing safety stock at [90% / 95% / 99%] service levels Setting reorder point (ROP) for each SKU Identifying SKUs where current stock is critically below ROP Flagging overstock items (current stock > [X days] of demand cover) and calculating holding cost impact Performing ABC-XYZ classification: value x demand variability Recommending differentiated replenishment policies by ABC-XYZ class with estimated cost savings vs. current policy Use Python. Provide code and a prioritized reorder action table.

Competitive benchmarking with data

Business Analytics
prompt template
Act as a competitive intelligence analyst. I have data on [X competitors] across metrics like: [revenue growth, market share, pricing, NPS, product features, employee count, Glassdoor rating, ad spend, etc.]. Sources include: [public reports, web scrapes, industry databases, survey data]. Build a competitive benchmarking analysis that: Normalizes all metrics to a common scale for fair comparison Creates a competitive positioning matrix (2x2 or radar chart) using [X vs. Y dimensions] Identifies areas where our company leads, is at parity, or lags behind Scores each competitor on a weighted KPI scorecard Highlights metrics where competitors have improved fastest over the past [X quarters] Identifies one "white space" opportunity no competitor is currently winning Produces a one-page competitive summary with a strategic priority list Use Python or Excel. Provide code and the formatted output.

Social media analytics and performance reporting

Marketing Analytics
prompt template
You are a social media analytics expert. I have platform data from [LinkedIn / Instagram / Twitter / TikTok / YouTube] with columns: [post_id, date, format, impressions, reach, engagements, clicks, conversions, follower_count, post_copy]. Build a performance analysis that: Calculates engagement rate, CTR, and CPE (cost per engagement if paid data available) per post Identifies the top 10 and bottom 10 performing posts by [primary KPI] Analyzes performance by content format (video vs. image vs. carousel vs. text) Tracks follower growth rate and correlates spikes with specific posts or campaigns Detects the best posting day and time for [reach / engagement / conversions] Compares [current period] vs. [prior period] performance across all metrics Recommends a content calendar strategy for next [month / quarter] based on findings Use Python. Provide code and a monthly performance report template.

Fraud detection model

ML & Prediction
prompt template
Act as a fraud analytics specialist. I have transaction data with columns: [transaction_id, user_id, amount, merchant_category, timestamp, device_type, location, is_fraud (label)]. The fraud rate is approximately [X%]. Build a fraud detection system that: Performs feature engineering: transaction velocity, amount deviation from user baseline, geographic anomalies, time-of-day patterns Handles extreme class imbalance using SMOTE, class weighting, and anomaly detection approaches Trains and compares: Logistic Regression, Random Forest, XGBoost, and Isolation Forest Optimizes for recall (minimize false negatives / missed fraud) while keeping precision acceptable Selects optimal classification threshold based on the cost matrix: [false negative cost = $X, false positive cost = $Y] Explains flagged transactions using SHAP local explanations Outputs a real-time scoring function that takes a new transaction and returns a fraud probability and risk tier Provide Python code, evaluation metrics, and a risk tier decision guide.

Web analytics and conversion optimization

Digital Analytics
prompt template
Act as a web analytics expert. I have Google Analytics or similar data covering [time period] for [website / app]. Available dimensions include: [sessions, users, bounce rate, pages per session, goal completions, revenue, traffic source, device type, landing page, and others]. Perform a full web analytics audit that: Identifies top traffic sources by volume and by conversion rate Finds the highest-traffic pages with the lowest conversion rates (biggest opportunity pages) Analyzes mobile vs. desktop performance gap across all key metrics Builds a session quality score using engagement proxies (time on page, pages per session, scroll depth) Detects pages with unusually high exit rates and flags them for UX review Identifies the content or landing pages most correlated with eventual conversion Recommends a prioritized CRO (conversion rate optimization) experiment roadmap with estimated impact Use Python (or GA4 API). Provide code and a CRO priority matrix.

Bayesian A/B test analysis

Experimentation
prompt template
Act as a Bayesian statistician. I have results from an A/B test: Control conversions: [X] out of [N] Treatment conversions: [X] out of [N] Perform a Bayesian analysis that: Sets up a Beta-Binomial model with an appropriate prior (justify your choice) Calculates the posterior distribution for each variant's conversion rate Plots both posterior distributions on one chart with credible intervals Computes the probability that B is better than A Calculates the expected loss from choosing each variant Determines if a decision can be made now or if more data is needed Summarizes the result in plain language for a non-technical product manager Use Python (pymc or scipy). Provide code, the posterior plot, and a decision recommendation.

Demographic analysis and segmentation

Customer Analytics
prompt template
You are a market research analyst. I have demographic survey data with columns: [age, gender, income_bracket, education_level, location, product_usage_frequency, satisfaction_score, NPS, and others]. Perform a demographic analysis that: Profiles the overall respondent base (distributions for each demographic variable) Cross-tabulates satisfaction and NPS by each demographic segment Identifies which demographic groups have the highest and lowest satisfaction Tests whether demographic differences in satisfaction are statistically significant Builds a persona for the top 3 most engaged demographic segments Maps demographic segments to product or marketing recommendations Identifies any underrepresented demographic segments and flags potential survey bias Use Python. Provide code, a segment summary table, and 3 persona write-ups.

Predictive maintenance analysis

Operations Analytics
prompt template
Act as a reliability engineer and data scientist. I have sensor or operational data from [X machines / assets] with columns: [asset_id, timestamp, sensor readings (temperature, vibration, pressure, etc.), maintenance_logs, failure_flag]. Build a predictive maintenance model that: Engineers time-windowed features: rolling mean, rolling std, rate of change per sensor over [X hours / days] Labels failure precursor windows: flag data [X hours] before each failure event Trains a classifier to predict failure probability in the next [X hours] Evaluates using precision, recall, and lead time — how early does the model detect failure? Identifies which sensors are the strongest predictors using SHAP Builds a maintenance scheduling rule: trigger alert when probability exceeds [X%] Estimates cost savings from predicted vs. reactive maintenance based on [downtime cost per hour] Use Python (scikit-learn, tsfresh). Provide code, a confusion matrix, and a cost-benefit summary.

Cross-sell and upsell opportunity analysis

Revenue Analytics
prompt template
Act as a revenue analytics specialist. I have purchase history data with columns: [customer_id, product_id, product_category, purchase_date, revenue, customer_segment, account_manager]. Identify cross-sell and upsell opportunities by: Calculating product penetration rate by customer segment Identifying which products are most commonly purchased together (use association rules) Finding customers who buy Product A but not Product B, where B has high affinity with A Scoring each customer-product pair for upsell potential using purchase recency, frequency, and fit Ranking the top 50 accounts by estimated expansion revenue opportunity Mapping each opportunity to a recommended outreach action and suggested offer Estimating total addressable expansion revenue if top 20% of opportunities are converted Use Python. Provide code and a prioritized account expansion list ready for the sales team.

Multivariate testing analysis

Experimentation
prompt template
Act as an experimentation analyst. I ran a multivariate test on [landing page / email / ad] with the following element variants: Element 1 (headline): [A, B, C] Element 2 (CTA button): [A, B] Element 3 (hero image): [A, B] Primary metric: [conversion rate] Total sessions: [X] split across [N] combinations Analyze the results by: Calculating conversion rate and statistical significance for each combination Isolating the marginal effect of each element using factorial analysis Identifying the winning combination and quantifying its improvement over control Checking for interaction effects between elements Correcting for multiple comparisons using Bonferroni or FDR correction Estimating the revenue impact of deploying the winning combination Recommending which elements to lock in and which need further testing Use Python. Provide code and a results summary table.

Exploratory analysis of unstructured log data

Engineering Analytics
prompt template
You are a data engineer and analyst. I have server or application log files in [format: plain text, JSON, syslog] covering [time period]. The logs contain fields like: [timestamp, severity level, service name, user_id, error_code, message text]. Parse and analyze the logs by: Parsing raw log lines into a structured DataFrame Counting error frequency by type, service, and time Identifying recurring error patterns or error bursts using time bucketing Correlating error spikes with deployment events or traffic surges Extracting the most common error messages using clustering or frequency analysis Identifying users or sessions most frequently encountering errors Producing an operational health report: top 5 critical issues, their frequency, and recommended fixes Use Python (pandas, regex). Provide log parsing code and the health report output.

Sales pipeline and forecast analysis

Sales Analytics
prompt template
Act as a sales operations analyst. I have CRM pipeline data with columns: [deal_id, stage, amount, close_date, owner, product_line, lead_source, created_date, days_in_stage, probability]. Analyze the pipeline by: Calculating pipeline coverage ratio: total pipeline value vs. quota for [quarter] Identifying deals at risk: past expected close date, stuck in stage for > [X days], or low engagement signals Building a weighted forecast: sum of (amount x probability) vs. a regression-based forecast Analyzing win rate and average deal size by stage, rep, product line, and lead source Detecting pipeline leakage: deals that moved backward in stage or were suddenly lost Forecasting end-of-quarter attainment under three scenarios (commit / upside / downside) Recommending 3 pipeline actions for the sales manager to take this week Use Python or SQL. Provide code and a deal risk heat map.

Data anonymization and privacy compliance check

Data Governance
prompt template
Act as a data privacy engineer. I have a dataset I need to share externally (with a vendor / for analytics / for research) that contains potentially sensitive information. Columns include: [list all columns]. Perform a privacy compliance review and anonymization that: Classifies each column by sensitivity level: PII, quasi-identifier, sensitive attribute, non-sensitive Identifies re-identification risk from combinations of quasi-identifiers (k-anonymity check) Applies appropriate anonymization: masking, pseudonymization, generalization, or suppression per column Verifies the anonymized dataset meets k-anonymity where k >= [3 / 5 / 10] Tests that anonymized data preserves the statistical properties needed for the intended analysis Documents all transformations for compliance records Produces a privacy impact assessment summary suitable for a DPO review Use Python. Provide code and the compliance documentation template.

Cluster profiling and business interpretation

Segmentation
prompt template
Act as a customer analytics expert. I have already run K-Means clustering and produced [X clusters] on a dataset with columns: [list features used]. Each customer now has a cluster label assigned. Profile and interpret the clusters by: Calculating the mean and median of each feature by cluster Identifying the top 3 distinguishing features per cluster using ANOVA F-test or feature importance Writing a plain-English persona for each cluster: name, key traits, likely behaviors, and needs Visualizing clusters using a radar chart overlaid for all segments Sizing each cluster: count, percentage of total, and share of [revenue / orders / usage] Mapping each cluster to the most appropriate product, pricing, or marketing strategy Recommending a measurement plan to track how cluster membership changes over time Provide Python code and a cluster profile card for each segment.

Data audit for a new dataset

Data Quality
prompt template
Act as a data analyst onboarding a new dataset. I have just received a [CSV / database table / JSON file] from [source: a vendor / internal team / survey platform] that I have never worked with before. The dataset has [X rows] and [X columns]. Perform a thorough first-pass audit that: Inventories all columns: name, data type, sample values, and suspected meaning Calculates missing value rates and flags columns above [X%] missing Identifies suspicious values: nulls coded as strings, dates out of expected range, negative values in non-negative fields Detects likely duplicate records using fuzzy matching on [key identifier columns] Validates value distributions against business expectations (e.g., "age should be between 18 and 90") Assesses data freshness and coverage: what time period does this actually cover? Produces a data dictionary draft and a data quality scorecard Use Python. Provide code and the data dictionary + scorecard output.

Model monitoring and drift detection

ML Operations
prompt template
Act as an MLOps engineer. A predictive model was deployed [X weeks / months] ago. I now have [X weeks] of production prediction logs with columns: [timestamp, input_features, predicted_label or score, actual_label if available]. Set up model monitoring by: Detecting feature drift: compare production feature distributions vs. training distributions using KS test and PSI Detecting label drift: compare production prediction score distributions over time Monitoring model accuracy: calculate performance metrics on any labeled production data available Identifying underperforming data slices: are there specific segments where accuracy has degraded? Setting up statistical process control charts for key metrics with alert thresholds Producing a model health dashboard with weekly rollup Recommending a retraining trigger policy based on drift severity observed Use Python (evidently, scipy). Provide code and the monitoring dashboard spec.

Pricing tier and packaging analysis

Pricing & Revenue Analytics
prompt template
Act as a pricing strategist. I have data on [X customers] across [X pricing tiers or product packages] including: [customer_id, tier, mrr or arr, usage metrics, feature adoption flags, support ticket count, nps, churn_flag]. Analyze pricing tier performance by: Calculating revenue, churn rate, NPS, and support burden by tier Identifying which tier has the highest LTV and lowest churn Detecting customers who are under-tiered (high usage relative to their plan) Detecting customers who are over-tiered (low usage relative to their plan) Analyzing feature adoption: which features differentiate power users from low-usage customers? Running a willingness-to-pay analysis if survey or pricing experiment data is available Recommending a pricing tier restructuring with projected revenue impact Use Python. Provide code and a tier performance scorecard.

End-to-end analytics workflow builder

Workflow Automation
prompt template
Act as a senior data engineer and analyst. I need to build an automated end-to-end analytics workflow for [use case, e.g., weekly sales reporting / monthly churn analysis / real-time fraud scoring]. The workflow should: Extract data from [source: database / API / CSV] on [schedule: daily / weekly / on trigger] Run data validation checks before processing Transform and aggregate data according to [business logic: describe the KPIs and groupings needed] Apply [analytical model or rule, e.g., scoring model, statistical summary, forecasting function] Output results to [destination: dashboard, email report, Slack alert, database table] Log errors and send an alert if any step fails Be fully reproducible and parameterizable so it can be run for any time period on demand Use Python (pandas, SQLAlchemy, schedule or Airflow DAG). Provide full modular code with a README and deployment checklist.

Feature engineering for machine learning

ML & Prediction
prompt template
Act as a machine learning engineer. I have a raw dataset with columns: [list all columns and data types] that I need to prepare for a [classification / regression / ranking] model predicting [target variable]. Perform comprehensive feature engineering that: Creates interaction features between [specific column pairs] and explains the business rationale Encodes categorical variables using the most appropriate method per column (one-hot, target encoding, ordinal, binary) with justification Applies date/time decomposition: extract day of week, hour, month, quarter, days since reference date, is_weekend flag Engineers lag features and rolling window statistics (mean, std, min, max) for any time-dependent columns Creates binned or bucketed versions of skewed numeric columns Detects and removes or combines near-zero variance and perfectly correlated features Produces a final feature importance pre-check using mutual information scores before any model is trained Use Python (pandas, scikit-learn, feature-engine). Provide commented code and a feature catalog with business interpretation for each engineered variable.

Multi-dimensional KPI decomposition

Business Analytics
prompt template
Act as a strategic analyst. A key business KPI — [e.g., revenue per user / gross margin / CAC payback period] — has changed by [X%] between [period A] and [period B]. I need to fully decompose what drove this change. Decompose the KPI by: Breaking it into its mathematical components (e.g., revenue per user = sessions x conversion rate x AOV) Calculating the contribution of each component to the total change using a multiplicative or additive decomposition Further breaking down each component by [dimension: region / channel / product / segment] Identifying which single component and sub-dimension explains the largest share of the change Checking whether the change is broad-based or concentrated in a small subset of records Running a counterfactual: what would the KPI be if only [one component] had changed and everything else stayed flat? Producing a waterfall chart of contributions and a 3-sentence executive summary of the finding Use Python. Provide code, the waterfall chart, and the executive summary.

Text classification model for support tickets

NLP & Text Analytics
prompt template
You are an NLP engineer. I have [X] customer support tickets with columns: [ticket_id, ticket_text, resolution_time_hours, agent_id, and optionally a category label for a subset]. I want to automatically classify incoming tickets into categories: [list your categories, e.g., billing, technical issue, account access, feature request, complaint]. Build a text classification pipeline that: Cleans and preprocesses ticket text: lowercasing, punctuation removal, stopword filtering, lemmatization Converts text to features using TF-IDF and separately using sentence embeddings (sentence-transformers) Trains classifiers: Logistic Regression, SVM, and a fine-tuned DistilBERT — compare all three Evaluates using macro F1 and per-class precision/recall — flag any category with F1 below [X%] Handles class imbalance if any category has fewer than [X] training examples Builds a confidence-based routing rule: high-confidence predictions auto-route, low-confidence flag for human review Saves the production pipeline and outputs predictions for all unlabeled tickets with confidence scores Use Python (scikit-learn, transformers, sentence-transformers). Provide code and a model performance report.

Geographic market expansion analysis

Business Analytics
prompt template
Act as a market expansion analyst. My company currently operates in [current markets]. I am evaluating expanding into [list of candidate markets / regions]. I have the following data available: [market size, population, competitor presence, regulatory complexity score, logistics cost index, historical performance in similar markets]. Build an expansion prioritization analysis that: Defines and weights scoring criteria: [market size, competition, cost to serve, strategic fit, regulatory risk] Normalizes all criteria to a 0–10 scale and calculates a weighted opportunity score per market Builds a 2x2 prioritization matrix: opportunity score vs. ease of entry Identifies the top 3 markets to enter first and the rationale for each Estimates Year 1 revenue potential for the top market using bottom-up assumptions Flags the biggest risk for each top market and proposes a mitigation Produces a one-page market entry scorecard ready for executive review Use Python or Excel. Provide the scoring model, the 2x2 chart, and the scorecard.

Real-time dashboard with streaming data simulation

Reporting & BI
prompt template
Act as a BI engineer. I need to build a real-time monitoring dashboard for [use case: operations center / e-commerce live sales / server health / call center]. The data updates every [X seconds / minutes]. Build a real-time dashboard that: Simulates or connects to a streaming data source producing [describe metrics: orders per minute, error rate, active sessions, etc.] Displays a live updating line chart for the primary metric over a rolling [X minute] window Shows current-value KPI cards with delta vs. [X minutes ago] and color-coded status (green / amber / red) Triggers a visible alert when any metric crosses a [user-defined threshold] Logs all threshold breaches to a running alert table with timestamp and severity Includes a toggle to pause / resume the live feed for investigation Is deployable as a local web app requiring no external database Use Python (Dash / Streamlit / Panel) with threading or async data refresh. Provide full runnable code and a deployment guide.

Pricing elasticity experiment design

Experimentation
prompt template
Act as a pricing experimentation expert. I want to run a controlled pricing experiment to measure the price elasticity of [product / service / subscription tier]. Design the experiment by: Defining the price points to test: [anchor price, treatment prices — at least 3 levels] Calculating the required sample size per price point to detect a [X%] change in conversion rate at 80% power and 95% confidence Specifying the randomization unit: [user / session / geography / account] and justifying the choice Identifying and controlling for confounders: day of week, device, acquisition channel, user tenure Setting up a guardrail metric to detect any unintended harm (e.g., support ticket spike, refund rate increase) Designing the holdout strategy and experiment duration Producing a pre-analysis plan: primary metric, secondary metrics, statistical test, stopping rules, and decision criteria Provide a complete experiment brief in structured format ready for engineering and legal review.

Data catalog and metadata management setup

Data Governance
prompt template
Act as a data governance engineer. My organization has [X] datasets across [list of systems: data warehouse, data lake, operational databases, third-party feeds] with no centralized catalog or metadata documentation. Build a data catalog foundation by: Defining a metadata schema: dataset name, owner, source system, refresh frequency, row count, last updated, sensitivity classification, primary key, description Writing a Python script to auto-inventory all tables in [database system] and populate the schema Classifying all datasets by sensitivity: public, internal, confidential, restricted Identifying datasets with no documented owner and flagging for assignment Detecting datasets that haven't been updated in [X days] and marking as potentially stale Building a searchable HTML or markdown catalog output from the inventory Recommending a governance process: who reviews the catalog, how often, and what triggers an update Provide Python code, a sample catalog output, and a governance process one-pager.

Causal inference analysis

Statistics
prompt template
Act as a causal inference specialist. I want to estimate the causal effect of [intervention: a policy change / feature launch / marketing campaign / price change] on [outcome metric], using observational data (no randomization was possible). Dataset: [X rows], columns include: [treatment indicator, outcome variable, pre-treatment covariates, time variable if panel data]. Perform a causal analysis using: Propensity score matching or inverse probability weighting to balance treatment and control groups — report covariate balance before and after Difference-in-differences if panel data is available — verify the parallel trends assumption Instrumental variable (IV) estimation if a valid instrument exists: [describe candidate instrument] A sensitivity analysis to assess how robust the estimate is to hidden confounding (Rosenbaum bounds) Report the average treatment effect (ATE) and average treatment effect on the treated (ATT) with confidence intervals Interpret the result in plain business language: what actually caused what? Flag the key assumptions made and how likely they are to hold in this context Use Python (causalinference, econml, linearmodels). Provide code and a causal evidence summary.

Automated reporting pipeline with email delivery

Workflow Automation
prompt template
Act as a data engineer. I need an automated reporting pipeline that pulls data, generates a formatted report, and emails it to [stakeholders] every [daily / weekly / monthly] on [schedule]. Build the pipeline that: Connects to [data source: PostgreSQL / BigQuery / CSV / API] and runs the required query or extraction Calculates the KPIs: [list metrics with formulas] Generates a formatted report in [PDF / Excel / HTML email] with charts, summary tables, and period-over-period comparisons Highlights metrics that are above or below target with color coding Writes a plain-English narrative summary section that auto-populates based on the data values Sends the report via email using [SMTP / SendGrid / AWS SES] to a configurable recipient list Logs each run with status (success / failure), record count, and timestamp — alerts on failure via [Slack / email] Use Python (pandas, matplotlib, smtplib or sendgrid, schedule or cron). Provide full modular code, a config file template, and a deployment checklist.

Benchmarking model performance across data slices

ML & Prediction
prompt template
Act as a responsible AI and ML evaluation specialist. I have a trained [classification / regression] model and a labeled test set with columns: [features, target, and demographic or segment fields like region, age_group, device_type, customer_tier]. Evaluate model fairness and slice performance by: Calculating overall model performance: accuracy, F1, AUC, RMSE as appropriate Breaking down performance by each segment dimension: compute the same metrics per slice Identifying slices where performance falls below [X%] of overall performance — flag these as underperforming Testing whether performance gaps between slices are statistically significant Checking for disparate impact: does the model systematically over-predict or under-predict for any group? Tracing underperforming slices back to training data: are they underrepresented or have noisier labels? Recommending remediation: targeted data collection, slice-specific retraining, or post-processing calibration Use Python (sklearn, slicefinder or manual pandas groupby). Provide code, a slice performance heatmap, and a model fairness summary memo.

Customer journey mapping with data

Customer Analytics
prompt template
Act as a customer experience analyst. I have event-level behavioral data tracking customers across multiple touchpoints with columns: [customer_id, event_type, channel, timestamp, session_id, device_type, and any outcome flags like purchase or churn]. Map and analyze the customer journey by: Reconstructing individual customer paths by sequencing events chronologically per customer Identifying the top 10 most common journey sequences from first touch to conversion or churn Calculating average time between each stage transition and flagging stages with the longest delays Building a Sankey diagram visualizing the flow of customers across all major touchpoints Segmenting journeys by outcome: compare paths of converted customers vs. churned customers to identify divergence points Detecting the single touchpoint whose presence most strongly correlates with a positive outcome using lift analysis Recommending 3 journey optimization interventions with estimated impact on conversion rate or time-to-value Use Python (pandas, plotly). Provide code, the Sankey diagram, and a journey insight summary for the CX or product team.

Dimensionality reduction and visualization

Advanced Analytics
prompt template
Act as a data scientist. I have a high-dimensional dataset with [X features] and [X rows] representing [customers / products / documents / sensor readings]. The data is difficult to interpret or visualize in its raw form. Apply dimensionality reduction by: Preprocessing all features: scale numerics, encode categoricals, handle missing values Applying PCA and retaining components that explain at least [80% / 90%] of variance — plot the scree plot and cumulative explained variance Applying t-SNE with perplexity values of [5, 30, 50] and comparing the resulting cluster structures Applying UMAP with [min_dist = 0.1, n_neighbors = 15] as an additional comparison Coloring all 2D projections by [a known label or segment field] to assess whether structure aligns with business categories Identifying any clear clusters or outlier groups visible in the reduced space and describing their characteristics in original feature terms Recommending which reduction method best preserves the structure relevant to [your downstream task: clustering / anomaly detection / visualization] and explaining why Use Python (scikit-learn, umap-learn, plotly). Provide code and a side-by-side comparison of all three projections with interpretation.

Discover More Prompts

Supercharge your operational systems with templates optimized across alternative verticals.