60 Best ChatGPT Prompts for Data Analysis
Looking for the right ChatGPT prompts for data analysis? This collection of 60 prompts helps you clean messy data, spot trends, and turn raw numbers into clear, decision-ready insights, fast.
Prompt 01
Full exploratory data analysis (EDA)
Exploration
prompt template
Act as a senior data analyst. I have a dataset with the following columns: [list your columns and data types]. The dataset contains [X rows] and covers [time period or scope].
Perform a comprehensive EDA that includes:
A summary of the data (shape, types, missing values, duplicates)
Descriptive statistics for all numeric columns (mean, median, std, min, max, quartiles)
Distribution analysis — identify skewed or bimodal variables
Correlation matrix — highlight the top 5 strongest correlations with [target variable]
Outlier detection using IQR and z-score methods
Key patterns, anomalies, or surprising findings
A prioritized list of 3–5 next analytical steps you recommend
Use Python (pandas, seaborn, matplotlib). Provide clean, commented code and a plain-English summary after each section.
Prompt 02
Intelligent data cleaning pipeline
Data Cleaning
prompt template
You are a data engineering expert. I have a messy dataset in [CSV / Excel / SQL table] format with these known quality issues:
Missing values in columns: [list columns]
Suspected duplicates in: [column or row identifiers]
Inconsistent formats in: [e.g., date columns, categorical labels]
Possible outliers in: [numeric columns]
Build a full data cleaning pipeline in Python that:
Detects and reports missing values with percentage per column
Imputes missing values using the most appropriate strategy per column and explains why
Removes or flags exact and fuzzy duplicates
Standardizes date formats to YYYY-MM-DD
Normalizes categorical columns (lowercase, strip whitespace, map variants to canonical labels)
Caps or removes outliers based on IQR or z-score, keeping a change log
Exports the cleaned dataset and a cleaning summary report
Flag any decisions that require my judgment before applying them.
Prompt 03
Hypothesis testing with interpretation
Statistics
prompt template
Act as a statistician. I want to test whether [specific hypothesis, e.g., "customers who receive email campaigns convert at a higher rate than those who don't"].
Dataset context:
Group A (control): [N = X, conversion rate or mean = Y]
Group B (treatment): [N = X, conversion rate or mean = Y]
Metric type: [binary / continuous / ordinal]
Do the following:
Recommend the correct statistical test (t-test, chi-square, Mann-Whitney U, ANOVA, etc.) and justify the choice
State the null and alternative hypotheses clearly
Check all test assumptions with code
Run the test and return: test statistic, p-value, confidence interval, and effect size
Interpret the result in plain business language
Warn me of any limitations or risks of the conclusion
Use Python (scipy, statsmodels). Include visualizations of the distributions.
Prompt 04
Strategic data visualization plan
Visualization
prompt template
You are a data visualization expert. I need to present findings from a [type of analysis] to [audience: executive team / technical stakeholders / clients].
My dataset has: [describe key columns and metrics]
My main message is: [the single most important finding]
For each of 5 chart types, tell me:
What question it answers
The exact chart type and why
Which columns go on which axis
What color encoding or annotation would add insight
Common mistakes to avoid
Then provide Python code (matplotlib or plotly) for the 2 most impactful charts, with titles, axis labels, and a one-sentence insight annotation on the chart.
Prompt 05
Predictive model build and evaluation
ML & Prediction
prompt template
Act as a machine learning engineer. I want to predict [target variable, e.g., customer churn / revenue / equipment failure].
Dataset:
Features: [list feature names and types]
Target: [name, type: binary / multiclass / continuous]
Size: [rows x columns]
Class imbalance (if classification): [e.g., 90% / 10%]
Build a complete ML pipeline that:
Splits data into train (70%), validation (15%), and test (15%) sets with stratification
Preprocesses features in-pipeline
Trains and compares at least 3 models
Tunes the best model using cross-validated grid search
Evaluates on the test set with all relevant metrics and plots
Explains top 10 features using SHAP values
Saves the final pipeline as a .pkl file
Prompt 06
SQL query optimization for large datasets
SQL & Databases
prompt template
Act as a senior database engineer. I have a SQL query that is running slowly on a table with [X million rows]. Here is the current query: [paste query].
Analyze and optimize it by:
Identifying all performance bottlenecks (full table scans, missing indexes, unnecessary joins)
Rewriting the query with optimizations explained line by line
Recommending indexes to create and explaining the trade-offs
Estimating the expected performance improvement
Suggesting query execution plan analysis steps (EXPLAIN / EXPLAIN ANALYZE)
Database: [PostgreSQL / MySQL / BigQuery / Snowflake]. Include before and after versions.
Prompt 07
Time series forecasting
Forecasting
prompt template
Act as a time series analyst. I have [X months / years] of [metric, e.g., weekly sales, daily active users] data with columns: [date column, value column, any other relevant columns].
Build a forecasting pipeline that:
Decomposes the series into trend, seasonality, and residual components
Tests for stationarity (ADF test) and applies differencing if needed
Identifies ARIMA parameters using ACF and PACF plots
Trains and compares: ARIMA, SARIMA, and Prophet
Evaluates models using RMSE, MAE, and MAPE on a held-out test set
Generates a [X weeks / months] forecast with 80% and 95% confidence intervals
Plots actuals vs. forecast with confidence bands
Provide Python code and flag any seasonality patterns or anomalies found.
Prompt 08
Customer segmentation analysis
Segmentation
prompt template
You are a customer analytics expert. I have a dataset of [X customers] with the following behavioral and demographic features: [list features].
Perform a full customer segmentation that:
Selects the most relevant features and explains the selection rationale
Scales and preprocesses the data appropriately
Determines the optimal number of clusters using the elbow method and silhouette score
Runs K-Means and DBSCAN and compares results
Profiles each segment: size, key characteristics, average metrics
Visualizes clusters using PCA or UMAP for 2D representation
Recommends a business action for each segment (e.g., upsell, retain, reactivate)
Use Python (scikit-learn, plotly). Provide both code and a plain-English segment summary.
Prompt 09
Cohort retention analysis
Product Analytics
prompt template
Act as a product analyst. I have event-level data with columns: [user_id, event_date, event_type, and any other relevant fields]. I want to measure user retention by cohort.
Build a cohort analysis that:
Defines cohorts by [first purchase date / signup month / first login]
Calculates monthly retention rates for each cohort up to [X months]
Generates a cohort retention heatmap (cohorts as rows, months as columns)
Identifies which cohorts have the strongest and weakest retention
Calculates average Day 1, Day 7, and Day 30 retention across all cohorts
Flags any cohort that drops below [X%] retention threshold and suggests possible causes
Recommends 2–3 product or lifecycle actions based on the retention pattern
Use Python (pandas, seaborn). Output the heatmap and a written commentary.
Prompt 010
A/B test results analysis
Experimentation
prompt template
Act as an experimentation analyst. I ran an A/B test with the following setup:
Experiment name: [name]
Control group (A): [N = X users, metric = Y]
Treatment group (B): [N = X users, metric = Y]
Primary metric: [conversion rate / revenue per user / click-through rate]
Secondary metrics: [list any]
Test duration: [X days]
Analyze the results by:
Checking for sample ratio mismatch (SRM)
Running the appropriate significance test with correction for multiple metrics
Calculating p-value, confidence interval, and minimum detectable effect
Checking for novelty effect or time-based degradation
Assessing practical significance — is the lift worth shipping?
Recommending a clear ship / no-ship / run longer decision with justification
Include Python code and a one-page summary suitable for a product review meeting.
Prompt 011
Funnel drop-off analysis
Product Analytics
prompt template
You are a conversion optimization analyst. I have funnel data tracking users across [X steps, e.g., Landing Page > Sign Up > Onboarding > First Purchase]. The dataset has columns: [user_id, step_name, timestamp, any segment fields].
Analyze funnel drop-off by:
Calculating conversion rate at each step overall
Breaking down conversion by [segment: device / channel / region / user type]
Identifying the single biggest drop-off point and quantifying the revenue or user impact
Detecting if drop-off has worsened or improved over [time period]
Running statistical significance tests on segment-level differences
Building a Sankey diagram or funnel chart visualization
Recommending 3 specific, prioritized experiments to improve the worst-performing step
Use Python or SQL. Provide code and a prioritized action plan.
Prompt 012
Churn prediction model
ML & Prediction
prompt template
Act as an ML engineer specializing in retention. I want to build a churn prediction model for [SaaS / e-commerce / subscription] customers.
Dataset:
Rows: one per customer
Features: [list behavioral, usage, demographic, and billing features]
Label: [churned = 1, active = 0]
Imbalance: [X% churned]
Prediction horizon: [e.g., predict churn in next 30 days]
Build a pipeline that:
Engineers relevant features (recency, frequency, tenure, usage trends)
Handles class imbalance using SMOTE or class weighting
Trains Logistic Regression, Random Forest, and XGBoost with cross-validation
Selects the threshold that maximizes F1 or recall based on business cost of false negatives
Outputs a scored customer list ranked by churn probability
Explains drivers for the top 10 highest-risk customers using SHAP force plots
Include code, evaluation metrics, and a decision memo for the retention team.
Prompt 013
Revenue attribution analysis
Business Analytics
prompt template
Act as a revenue analyst. I need to attribute [total revenue / MRR / ARR] across [channels / products / regions / sales reps] for [time period].
Dataset columns: [list relevant fields]
Perform attribution analysis that:
Calculates total and per-unit revenue by each dimension
Identifies the top 20% of drivers contributing 80% of revenue (Pareto analysis)
Computes month-over-month and year-over-year growth rates per segment
Detects any underperforming segments relative to target or prior period
Builds a contribution waterfall chart showing what drove overall revenue change
Flags anomalies (sudden spikes or drops) with a possible explanation
Provides a one-paragraph executive summary suitable for a board update
Use Python or SQL. Output charts and a narrative summary.
Prompt 014
Anomaly detection in operational data
Anomaly Detection
prompt template
You are a data scientist specializing in anomaly detection. I have [time series / transactional / sensor] data with columns: [list fields]. I need to identify unusual patterns that may indicate [fraud / equipment failure / data pipeline errors / sudden business changes].
Build a detection system that:
Applies statistical control charts (mean ± 2 or 3 sigma) as a baseline
Implements Isolation Forest and LOF for multivariate anomaly detection
Flags anomalies with a severity score (low / medium / high) based on deviation magnitude
Plots anomalies in context on a time series chart with highlighted points
Groups anomalies by type or likely root cause where possible
Recommends an alerting threshold strategy for production use
Outputs an anomaly log with timestamp, affected column, deviation value, and severity
Use Python (scikit-learn, pyod). Provide code and a summary of the most critical anomalies found.
Prompt 015
Data pipeline audit and quality report
Data Quality
prompt template
Act as a data quality engineer. I need to audit the data flowing through [pipeline name or description] and generate a quality report for stakeholders.
The pipeline processes [X records per day] from [source system] to [destination system / data warehouse]. Known or suspected issues include: [describe any known problems].
Perform an audit that:
Profiles each key table or dataset (row counts, column completeness, cardinality)
Detects schema drift between source and destination
Validates business rules: [list 3–5 rules, e.g., "no negative revenue", "user_id must be unique"]
Measures freshness: time since last update per table
Identifies referential integrity violations across joined tables
Scores overall data quality on a 1–10 scale with breakdown by dimension
Generates a stakeholder-ready report with a priority-ranked issue list
Use Python (Great Expectations or pandas). Provide code and a formatted report template.
Prompt 016
Market basket analysis
Retail & E-commerce Analytics
prompt template
You are a retail data analyst. I have transaction data with columns: [transaction_id, customer_id, product_id, product_name, quantity, date]. I want to find product associations to improve cross-sell and bundling strategies.
Perform market basket analysis that:
Transforms data into a basket format (one row per transaction, columns per product)
Runs the Apriori algorithm with minimum support: [X%] and confidence: [X%]
Filters and ranks association rules by lift, confidence, and support
Identifies the top 10 product pairs or triplets with the strongest lift
Visualizes the association network as a graph
Translates findings into 3 specific bundling or upsell recommendations
Estimates potential revenue uplift if the top recommendation is implemented
Use Python (mlxtend). Include code and a business-ready summary table.
Prompt 017
Price sensitivity and elasticity analysis
Pricing Analytics
prompt template
Act as a pricing analyst. I have historical data on [product / service] sales including: [price, quantity sold, date, segment or region, and any promotion flags].
Analyze price sensitivity and elasticity by:
Plotting demand curve (price vs. quantity) for each key segment
Calculating price elasticity of demand (PED) with confidence intervals
Identifying the price point that maximizes revenue vs. volume
Segmenting customers by price sensitivity (elastic vs. inelastic buyers)
Testing whether promotions have a lasting or temporary demand effect
Building a simple price optimization model using regression
Recommending 2–3 pricing strategy changes with projected impact
Use Python (statsmodels, scipy). Provide code, charts, and a pricing recommendation memo.
Prompt 018
Employee attrition analysis
HR Analytics
prompt template
You are an HR analytics expert. I have employee data with columns: [employee_id, department, role, tenure, salary, performance_rating, attrition (yes/no), engagement_score, manager_id, and others].
Analyze attrition by:
Calculating overall and department-level attrition rates
Identifying which roles, tenure bands, and pay grades have the highest attrition
Running a logistic regression to find the top 5 drivers of attrition
Building a survival analysis to estimate expected tenure by employee segment
Flagging high-risk current employees based on the model (top 10%)
Comparing attrition drivers vs. industry benchmarks where possible
Recommending 3 targeted retention interventions with estimated cost vs. impact
Use Python. Provide code, charts, and a summary report for the CHRO.
Prompt 019
Geospatial data analysis
Geospatial Analytics
prompt template
Act as a geospatial data analyst. I have a dataset with [X records] containing location data (latitude, longitude or address) along with: [other relevant fields like sales, incidents, customers, assets].
Perform geospatial analysis that:
Geocodes addresses to coordinates if needed
Plots all records on an interactive map with color-coded markers by [metric or category]
Performs clustering to identify geographic hotspots using DBSCAN or KMeans
Calculates distance-based metrics (e.g., nearest store, service area coverage)
Builds a choropleth map by [region / zip code / city] for [key metric]
Identifies underserved or overserved geographic areas
Recommends 2–3 location-based strategic actions based on the findings
Use Python (geopandas, folium, plotly). Provide code and an interactive HTML map output.
Prompt 020
Natural language processing on survey data
NLP & Text Analytics
prompt template
You are an NLP analyst. I have [X responses] from a customer or employee survey. The open-ended question was: "[paste question text]". Responses are in column [column name].
Analyze the text data by:
Cleaning the text (lowercasing, stopword removal, lemmatization)
Running sentiment analysis on each response (positive / negative / neutral) with confidence score
Extracting the top 20 keywords and bigrams using TF-IDF
Identifying 5–8 themes using topic modeling (LDA or BERTopic)
Grouping responses by sentiment x theme matrix
Flagging the most critical negative responses for immediate review
Generating a one-page executive summary of what respondents care most about
Use Python (NLTK, transformers, gensim). Provide code and the summary output.
Prompt 021
Financial forecasting and variance analysis
Finance Analytics
prompt template
Act as a financial analyst. I have [monthly / quarterly] actuals vs. budget data for [business unit / product / cost center] covering [time period]. Columns include: [revenue, COGS, gross margin, opex line items, net income, and period].
Perform a full variance analysis that:
Calculates actual vs. budget variance in absolute and percentage terms for each line item
Identifies the top 3 drivers of favorable and unfavorable variance
Builds a bridge (waterfall) chart showing how each variance contributed to the net income gap
Runs a rolling 12-month forecast using actuals trend extrapolation
Flags any line items where variance exceeds [X%] threshold for escalation
Performs scenario analysis: best case, base case, worst case for next [X quarters]
Produces a CFO-ready summary slide narrative with headline numbers and commentary
Use Python or Excel. Provide code or formulas and the narrative template.
Prompt 022
Root cause analysis with data
Diagnostics
prompt template
Act as an analytical problem-solver. A key metric ([metric name, e.g., conversion rate / NPS / order volume]) dropped by [X%] between [date A] and [date B]. I need to identify the root cause.
Available data: [describe datasets available — e.g., web analytics, transaction logs, CRM data, support tickets].
Perform a data-driven root cause analysis by:
Confirming the metric drop is real (not a tracking error or data pipeline issue)
Breaking the metric down by all available dimensions (segment, region, channel, product)
Isolating which dimension(s) account for most of the drop using a contribution analysis
Analyzing whether the issue started suddenly or gradually
Cross-referencing with events log (releases, campaigns, external events) for that period
Forming and testing the top 3 hypotheses with statistical evidence
Recommending a corrective action for the most likely root cause with an implementation timeline
Provide a structured RCA report with supporting charts and evidence.
Prompt 023
Dashboard design and KPI framework
Reporting & BI
prompt template
You are a BI analyst and dashboard designer. I need to build an executive dashboard for [business function: marketing / sales / operations / finance] tracking performance for [audience: C-suite / department head / team lead].
The business goal is: [describe the decision the dashboard should support].
Available data sources: [list systems, e.g., Salesforce, Google Analytics, ERP].
Key metrics I track today: [list current metrics].
Design the dashboard by:
Recommending a KPI hierarchy (north star metric, secondary metrics, diagnostic metrics)
Mapping each metric to a specific business question it answers
Specifying the best visualization type for each metric and why
Designing the layout: which metrics appear on top, how to group related KPIs
Defining alert thresholds for each KPI
Recommending refresh frequency (real-time / daily / weekly) per metric
Providing a mockup description or wireframe in text format ready for a BI tool like Tableau, Power BI, or Looker
Include a one-paragraph rationale for the overall design decisions.
Prompt 024
Data-driven marketing attribution
Marketing Analytics
prompt template
Act as a marketing data analyst. I need to attribute [revenue / conversions / sign-ups] across the following channels: [email, paid search, organic, social, direct, referral]. I have user-level touchpoint data with columns: [user_id, channel, touchpoint_date, conversion_flag, conversion_value].
Build a multi-touch attribution analysis that:
Compares four models: first touch, last touch, linear, and time decay
Calculates attributed revenue and conversion count per channel under each model
Visualizes how attribution share shifts across models for each channel
Identifies which channels are undervalued or overvalued by last-touch attribution
Recommends a data-driven attribution model using Shapley values
Estimates the incremental ROAS (return on ad spend) per channel
Produces a channel investment reallocation recommendation based on findings
Use Python. Provide code and a channel performance scorecard.
Prompt 025
Supply chain analytics and demand forecasting
Operations Analytics
prompt template
You are a supply chain analyst. I have [X months] of historical demand data with columns: [product_id, product_name, date, units_sold, units_returned, inventory_level, lead_time_days, stockout_flag].
Perform a supply chain analysis that:
Calculates demand variability (CV) per product to classify into fast / slow / erratic movers
Builds a demand forecast for each product for the next [X weeks / months] using appropriate model (moving average, exponential smoothing, or ML-based)
Calculates safety stock levels using service level target of [X%]
Identifies products at risk of stockout in the next [X days] given current inventory
Flags slow-moving or dead stock exceeding [X days] of inventory cover
Recommends reorder points and order quantities per SKU
Estimates the working capital impact of the recommended inventory policy vs. current policy
Use Python. Provide code and a prioritized reorder action list.
Prompt 026
Regression analysis for business drivers
Statistics
prompt template
Act as a quantitative analyst. I want to understand what drives [outcome variable, e.g., customer lifetime value / monthly revenue / support ticket volume] using regression analysis.
Dataset: [X rows], columns include: [list predictor variables and the outcome variable].
Run a full regression analysis that:
Checks assumptions: linearity, normality of residuals, homoscedasticity, multicollinearity (VIF)
Runs OLS regression and interprets each coefficient in plain business terms
Identifies and removes or handles outliers and high-leverage points
Tests for interaction effects between [specific variable pairs]
Uses stepwise or LASSO feature selection to find the minimal predictive model
Reports R-squared, adjusted R-squared, F-statistic, and AIC/BIC
Produces a coefficient plot with confidence intervals and a plain-English interpretation of each significant predictor
Use Python (statsmodels, sklearn). Provide code and a business-friendly results table.
Prompt 027
Sentiment analysis on product reviews
NLP & Text Analytics
prompt template
You are an NLP specialist. I have [X product reviews] for [product name] from [platform, e.g., Amazon, G2, App Store]. The dataset has columns: [review_text, rating, date, verified_purchase, and any other metadata].
Analyze reviews by:
Classifying sentiment at review level (positive / negative / neutral)
Extracting aspect-level sentiment: identify how customers feel about [price, quality, delivery, support, UX, etc.] separately
Tracking sentiment trend over [time period] — is sentiment improving or declining?
Identifying the top 10 most common complaints and the top 10 most praised features
Flagging reviews with high impact (verified + long + negative) for product team review
Comparing sentiment distribution across rating levels (do 3-star reviews skew more negative on which aspects?)
Generating a product feedback summary memo with prioritized action items for the product team
Use Python (transformers / VADER / spaCy). Provide code and the final memo.
Prompt 028
Network analysis for business relationships
Advanced Analytics
prompt template
Act as a network analyst. I have relational data representing [business relationships: customer referrals / supplier connections / employee collaboration / transaction networks]. The dataset has: [node_1, node_2, relationship_type, weight or frequency, date].
Perform a network analysis that:
Builds a directed or undirected graph from the data
Calculates centrality measures: degree, betweenness, closeness, and PageRank for each node
Identifies the top 10 most influential nodes and explains what their centrality means in business terms
Detects communities or clusters within the network
Finds any critical bridge nodes whose removal would fragment the network
Visualizes the network with node size proportional to influence and color by community
Recommends 2–3 strategic actions based on the network structure (e.g., key account prioritization, risk nodes, partnership opportunities)
Use Python (networkx, plotly). Provide code and a strategic summary.
Prompt 029
Lifetime value (LTV) modeling
Customer Analytics
prompt template
Act as a customer analytics expert. I need to calculate and model customer lifetime value (LTV) for [business type, e.g., SaaS, e-commerce, subscription box].
Dataset: [columns including customer_id, acquisition_date, orders or payments with amounts, churn_date if applicable, acquisition_channel, plan or product type].
Build a full LTV model that:
Calculates historical LTV for each customer (sum of gross margin over lifetime)
Segments customers into LTV tiers: top 10%, mid 40%, bottom 50%
Builds a predictive LTV model using BG/NBD + Gamma-Gamma (for non-contractual) or regression (for contractual)
Compares predicted LTV by acquisition channel and cohort
Calculates payback period per channel: LTV vs. CAC
Identifies the customer attributes most correlated with high LTV
Recommends acquisition and retention strategies for each LTV segment
Use Python (lifetimes library or custom). Provide code, a LTV distribution chart, and a strategic summary.
Prompt 030
Data storytelling and presentation builder
Communication & Reporting
prompt template
Act as a data storytelling expert. I have the following analysis results to present: [paste your key findings, charts, or tables].
My audience is: [e.g., CEO, VP of Marketing, board of directors]
Their key question is: [e.g., "Should we expand into the EMEA market?" / "Is our growth sustainable?"]
Decision to be made: [describe the decision]
Build a data-driven narrative that:
Opens with a single headline insight that directly answers the audience's key question
Structures the story as: situation → complication → resolution (SCR framework)
Selects the 3 most compelling data points and writes a plain-English sentence for each
Recommends which charts to include and where in the narrative flow
Writes transitions between sections that maintain analytical logic
Anticipates the top 3 questions the audience will ask and provides pre-drafted answers with supporting data
Ends with a clear call to action tied to the business decision
Produce the full narrative script and a slide-by-slide outline.
Prompt 031
Inventory optimization analysis
Operations Analytics
prompt template
You are an operations research analyst. I have inventory and sales data with columns: [product_id, sku, current_stock, average_daily_demand, demand_std_dev, lead_time_days, holding_cost_per_unit, stockout_cost_per_unit, order_cost].
Optimize my inventory policy by:
Calculating EOQ (Economic Order Quantity) for each SKU
Computing safety stock at [90% / 95% / 99%] service levels
Setting reorder point (ROP) for each SKU
Identifying SKUs where current stock is critically below ROP
Flagging overstock items (current stock > [X days] of demand cover) and calculating holding cost impact
Performing ABC-XYZ classification: value x demand variability
Recommending differentiated replenishment policies by ABC-XYZ class with estimated cost savings vs. current policy
Use Python. Provide code and a prioritized reorder action table.
Prompt 032
Competitive benchmarking with data
Business Analytics
prompt template
Act as a competitive intelligence analyst. I have data on [X competitors] across metrics like: [revenue growth, market share, pricing, NPS, product features, employee count, Glassdoor rating, ad spend, etc.]. Sources include: [public reports, web scrapes, industry databases, survey data].
Build a competitive benchmarking analysis that:
Normalizes all metrics to a common scale for fair comparison
Creates a competitive positioning matrix (2x2 or radar chart) using [X vs. Y dimensions]
Identifies areas where our company leads, is at parity, or lags behind
Scores each competitor on a weighted KPI scorecard
Highlights metrics where competitors have improved fastest over the past [X quarters]
Identifies one "white space" opportunity no competitor is currently winning
Produces a one-page competitive summary with a strategic priority list
Use Python or Excel. Provide code and the formatted output.
Prompt 033
Social media analytics and performance reporting
Marketing Analytics
prompt template
You are a social media analytics expert. I have platform data from [LinkedIn / Instagram / Twitter / TikTok / YouTube] with columns: [post_id, date, format, impressions, reach, engagements, clicks, conversions, follower_count, post_copy].
Build a performance analysis that:
Calculates engagement rate, CTR, and CPE (cost per engagement if paid data available) per post
Identifies the top 10 and bottom 10 performing posts by [primary KPI]
Analyzes performance by content format (video vs. image vs. carousel vs. text)
Tracks follower growth rate and correlates spikes with specific posts or campaigns
Detects the best posting day and time for [reach / engagement / conversions]
Compares [current period] vs. [prior period] performance across all metrics
Recommends a content calendar strategy for next [month / quarter] based on findings
Use Python. Provide code and a monthly performance report template.
Prompt 034
Fraud detection model
ML & Prediction
prompt template
Act as a fraud analytics specialist. I have transaction data with columns: [transaction_id, user_id, amount, merchant_category, timestamp, device_type, location, is_fraud (label)]. The fraud rate is approximately [X%].
Build a fraud detection system that:
Performs feature engineering: transaction velocity, amount deviation from user baseline, geographic anomalies, time-of-day patterns
Handles extreme class imbalance using SMOTE, class weighting, and anomaly detection approaches
Trains and compares: Logistic Regression, Random Forest, XGBoost, and Isolation Forest
Optimizes for recall (minimize false negatives / missed fraud) while keeping precision acceptable
Selects optimal classification threshold based on the cost matrix: [false negative cost = $X, false positive cost = $Y]
Explains flagged transactions using SHAP local explanations
Outputs a real-time scoring function that takes a new transaction and returns a fraud probability and risk tier
Provide Python code, evaluation metrics, and a risk tier decision guide.
Prompt 035
Web analytics and conversion optimization
Digital Analytics
prompt template
Act as a web analytics expert. I have Google Analytics or similar data covering [time period] for [website / app]. Available dimensions include: [sessions, users, bounce rate, pages per session, goal completions, revenue, traffic source, device type, landing page, and others].
Perform a full web analytics audit that:
Identifies top traffic sources by volume and by conversion rate
Finds the highest-traffic pages with the lowest conversion rates (biggest opportunity pages)
Analyzes mobile vs. desktop performance gap across all key metrics
Builds a session quality score using engagement proxies (time on page, pages per session, scroll depth)
Detects pages with unusually high exit rates and flags them for UX review
Identifies the content or landing pages most correlated with eventual conversion
Recommends a prioritized CRO (conversion rate optimization) experiment roadmap with estimated impact
Use Python (or GA4 API). Provide code and a CRO priority matrix.
Prompt 036
Bayesian A/B test analysis
Experimentation
prompt template
Act as a Bayesian statistician. I have results from an A/B test:
Control conversions: [X] out of [N]
Treatment conversions: [X] out of [N]
Perform a Bayesian analysis that:
Sets up a Beta-Binomial model with an appropriate prior (justify your choice)
Calculates the posterior distribution for each variant's conversion rate
Plots both posterior distributions on one chart with credible intervals
Computes the probability that B is better than A
Calculates the expected loss from choosing each variant
Determines if a decision can be made now or if more data is needed
Summarizes the result in plain language for a non-technical product manager
Use Python (pymc or scipy). Provide code, the posterior plot, and a decision recommendation.
Prompt 037
Demographic analysis and segmentation
Customer Analytics
prompt template
You are a market research analyst. I have demographic survey data with columns: [age, gender, income_bracket, education_level, location, product_usage_frequency, satisfaction_score, NPS, and others].
Perform a demographic analysis that:
Profiles the overall respondent base (distributions for each demographic variable)
Cross-tabulates satisfaction and NPS by each demographic segment
Identifies which demographic groups have the highest and lowest satisfaction
Tests whether demographic differences in satisfaction are statistically significant
Builds a persona for the top 3 most engaged demographic segments
Maps demographic segments to product or marketing recommendations
Identifies any underrepresented demographic segments and flags potential survey bias
Use Python. Provide code, a segment summary table, and 3 persona write-ups.
Prompt 038
Predictive maintenance analysis
Operations Analytics
prompt template
Act as a reliability engineer and data scientist. I have sensor or operational data from [X machines / assets] with columns: [asset_id, timestamp, sensor readings (temperature, vibration, pressure, etc.), maintenance_logs, failure_flag].
Build a predictive maintenance model that:
Engineers time-windowed features: rolling mean, rolling std, rate of change per sensor over [X hours / days]
Labels failure precursor windows: flag data [X hours] before each failure event
Trains a classifier to predict failure probability in the next [X hours]
Evaluates using precision, recall, and lead time — how early does the model detect failure?
Identifies which sensors are the strongest predictors using SHAP
Builds a maintenance scheduling rule: trigger alert when probability exceeds [X%]
Estimates cost savings from predicted vs. reactive maintenance based on [downtime cost per hour]
Use Python (scikit-learn, tsfresh). Provide code, a confusion matrix, and a cost-benefit summary.
Prompt 039
Cross-sell and upsell opportunity analysis
Revenue Analytics
prompt template
Act as a revenue analytics specialist. I have purchase history data with columns: [customer_id, product_id, product_category, purchase_date, revenue, customer_segment, account_manager].
Identify cross-sell and upsell opportunities by:
Calculating product penetration rate by customer segment
Identifying which products are most commonly purchased together (use association rules)
Finding customers who buy Product A but not Product B, where B has high affinity with A
Scoring each customer-product pair for upsell potential using purchase recency, frequency, and fit
Ranking the top 50 accounts by estimated expansion revenue opportunity
Mapping each opportunity to a recommended outreach action and suggested offer
Estimating total addressable expansion revenue if top 20% of opportunities are converted
Use Python. Provide code and a prioritized account expansion list ready for the sales team.
Prompt 040
Multivariate testing analysis
Experimentation
prompt template
Act as an experimentation analyst. I ran a multivariate test on [landing page / email / ad] with the following element variants:
Element 1 (headline): [A, B, C]
Element 2 (CTA button): [A, B]
Element 3 (hero image): [A, B]
Primary metric: [conversion rate]
Total sessions: [X] split across [N] combinations
Analyze the results by:
Calculating conversion rate and statistical significance for each combination
Isolating the marginal effect of each element using factorial analysis
Identifying the winning combination and quantifying its improvement over control
Checking for interaction effects between elements
Correcting for multiple comparisons using Bonferroni or FDR correction
Estimating the revenue impact of deploying the winning combination
Recommending which elements to lock in and which need further testing
Use Python. Provide code and a results summary table.
Prompt 041
Exploratory analysis of unstructured log data
Engineering Analytics
prompt template
You are a data engineer and analyst. I have server or application log files in [format: plain text, JSON, syslog] covering [time period]. The logs contain fields like: [timestamp, severity level, service name, user_id, error_code, message text].
Parse and analyze the logs by:
Parsing raw log lines into a structured DataFrame
Counting error frequency by type, service, and time
Identifying recurring error patterns or error bursts using time bucketing
Correlating error spikes with deployment events or traffic surges
Extracting the most common error messages using clustering or frequency analysis
Identifying users or sessions most frequently encountering errors
Producing an operational health report: top 5 critical issues, their frequency, and recommended fixes
Use Python (pandas, regex). Provide log parsing code and the health report output.
Prompt 042
Sales pipeline and forecast analysis
Sales Analytics
prompt template
Act as a sales operations analyst. I have CRM pipeline data with columns: [deal_id, stage, amount, close_date, owner, product_line, lead_source, created_date, days_in_stage, probability].
Analyze the pipeline by:
Calculating pipeline coverage ratio: total pipeline value vs. quota for [quarter]
Identifying deals at risk: past expected close date, stuck in stage for > [X days], or low engagement signals
Building a weighted forecast: sum of (amount x probability) vs. a regression-based forecast
Analyzing win rate and average deal size by stage, rep, product line, and lead source
Detecting pipeline leakage: deals that moved backward in stage or were suddenly lost
Forecasting end-of-quarter attainment under three scenarios (commit / upside / downside)
Recommending 3 pipeline actions for the sales manager to take this week
Use Python or SQL. Provide code and a deal risk heat map.
Prompt 043
Data anonymization and privacy compliance check
Data Governance
prompt template
Act as a data privacy engineer. I have a dataset I need to share externally (with a vendor / for analytics / for research) that contains potentially sensitive information. Columns include: [list all columns].
Perform a privacy compliance review and anonymization that:
Classifies each column by sensitivity level: PII, quasi-identifier, sensitive attribute, non-sensitive
Identifies re-identification risk from combinations of quasi-identifiers (k-anonymity check)
Applies appropriate anonymization: masking, pseudonymization, generalization, or suppression per column
Verifies the anonymized dataset meets k-anonymity where k >= [3 / 5 / 10]
Tests that anonymized data preserves the statistical properties needed for the intended analysis
Documents all transformations for compliance records
Produces a privacy impact assessment summary suitable for a DPO review
Use Python. Provide code and the compliance documentation template.
Prompt 044
Cluster profiling and business interpretation
Segmentation
prompt template
Act as a customer analytics expert. I have already run K-Means clustering and produced [X clusters] on a dataset with columns: [list features used]. Each customer now has a cluster label assigned.
Profile and interpret the clusters by:
Calculating the mean and median of each feature by cluster
Identifying the top 3 distinguishing features per cluster using ANOVA F-test or feature importance
Writing a plain-English persona for each cluster: name, key traits, likely behaviors, and needs
Visualizing clusters using a radar chart overlaid for all segments
Sizing each cluster: count, percentage of total, and share of [revenue / orders / usage]
Mapping each cluster to the most appropriate product, pricing, or marketing strategy
Recommending a measurement plan to track how cluster membership changes over time
Provide Python code and a cluster profile card for each segment.
Prompt 045
Data audit for a new dataset
Data Quality
prompt template
Act as a data analyst onboarding a new dataset. I have just received a [CSV / database table / JSON file] from [source: a vendor / internal team / survey platform] that I have never worked with before. The dataset has [X rows] and [X columns].
Perform a thorough first-pass audit that:
Inventories all columns: name, data type, sample values, and suspected meaning
Calculates missing value rates and flags columns above [X%] missing
Identifies suspicious values: nulls coded as strings, dates out of expected range, negative values in non-negative fields
Detects likely duplicate records using fuzzy matching on [key identifier columns]
Validates value distributions against business expectations (e.g., "age should be between 18 and 90")
Assesses data freshness and coverage: what time period does this actually cover?
Produces a data dictionary draft and a data quality scorecard
Use Python. Provide code and the data dictionary + scorecard output.
Prompt 046
Model monitoring and drift detection
ML Operations
prompt template
Act as an MLOps engineer. A predictive model was deployed [X weeks / months] ago. I now have [X weeks] of production prediction logs with columns: [timestamp, input_features, predicted_label or score, actual_label if available].
Set up model monitoring by:
Detecting feature drift: compare production feature distributions vs. training distributions using KS test and PSI
Detecting label drift: compare production prediction score distributions over time
Monitoring model accuracy: calculate performance metrics on any labeled production data available
Identifying underperforming data slices: are there specific segments where accuracy has degraded?
Setting up statistical process control charts for key metrics with alert thresholds
Producing a model health dashboard with weekly rollup
Recommending a retraining trigger policy based on drift severity observed
Use Python (evidently, scipy). Provide code and the monitoring dashboard spec.
Prompt 047
Pricing tier and packaging analysis
Pricing & Revenue Analytics
prompt template
Act as a pricing strategist. I have data on [X customers] across [X pricing tiers or product packages] including: [customer_id, tier, mrr or arr, usage metrics, feature adoption flags, support ticket count, nps, churn_flag].
Analyze pricing tier performance by:
Calculating revenue, churn rate, NPS, and support burden by tier
Identifying which tier has the highest LTV and lowest churn
Detecting customers who are under-tiered (high usage relative to their plan)
Detecting customers who are over-tiered (low usage relative to their plan)
Analyzing feature adoption: which features differentiate power users from low-usage customers?
Running a willingness-to-pay analysis if survey or pricing experiment data is available
Recommending a pricing tier restructuring with projected revenue impact
Use Python. Provide code and a tier performance scorecard.
Prompt 048
End-to-end analytics workflow builder
Workflow Automation
prompt template
Act as a senior data engineer and analyst. I need to build an automated end-to-end analytics workflow for [use case, e.g., weekly sales reporting / monthly churn analysis / real-time fraud scoring].
The workflow should:
Extract data from [source: database / API / CSV] on [schedule: daily / weekly / on trigger]
Run data validation checks before processing
Transform and aggregate data according to [business logic: describe the KPIs and groupings needed]
Apply [analytical model or rule, e.g., scoring model, statistical summary, forecasting function]
Output results to [destination: dashboard, email report, Slack alert, database table]
Log errors and send an alert if any step fails
Be fully reproducible and parameterizable so it can be run for any time period on demand
Use Python (pandas, SQLAlchemy, schedule or Airflow DAG). Provide full modular code with a README and deployment checklist.
Prompt 049
Feature engineering for machine learning
ML & Prediction
prompt template
Act as a machine learning engineer. I have a raw dataset with columns: [list all columns and data types] that I need to prepare for a [classification / regression / ranking] model predicting [target variable].
Perform comprehensive feature engineering that:
Creates interaction features between [specific column pairs] and explains the business rationale
Encodes categorical variables using the most appropriate method per column (one-hot, target encoding, ordinal, binary) with justification
Applies date/time decomposition: extract day of week, hour, month, quarter, days since reference date, is_weekend flag
Engineers lag features and rolling window statistics (mean, std, min, max) for any time-dependent columns
Creates binned or bucketed versions of skewed numeric columns
Detects and removes or combines near-zero variance and perfectly correlated features
Produces a final feature importance pre-check using mutual information scores before any model is trained
Use Python (pandas, scikit-learn, feature-engine). Provide commented code and a feature catalog with business interpretation for each engineered variable.
Prompt 050
Multi-dimensional KPI decomposition
Business Analytics
prompt template
Act as a strategic analyst. A key business KPI — [e.g., revenue per user / gross margin / CAC payback period] — has changed by [X%] between [period A] and [period B]. I need to fully decompose what drove this change.
Decompose the KPI by:
Breaking it into its mathematical components (e.g., revenue per user = sessions x conversion rate x AOV)
Calculating the contribution of each component to the total change using a multiplicative or additive decomposition
Further breaking down each component by [dimension: region / channel / product / segment]
Identifying which single component and sub-dimension explains the largest share of the change
Checking whether the change is broad-based or concentrated in a small subset of records
Running a counterfactual: what would the KPI be if only [one component] had changed and everything else stayed flat?
Producing a waterfall chart of contributions and a 3-sentence executive summary of the finding
Use Python. Provide code, the waterfall chart, and the executive summary.
Prompt 051
Text classification model for support tickets
NLP & Text Analytics
prompt template
You are an NLP engineer. I have [X] customer support tickets with columns: [ticket_id, ticket_text, resolution_time_hours, agent_id, and optionally a category label for a subset]. I want to automatically classify incoming tickets into categories: [list your categories, e.g., billing, technical issue, account access, feature request, complaint].
Build a text classification pipeline that:
Cleans and preprocesses ticket text: lowercasing, punctuation removal, stopword filtering, lemmatization
Converts text to features using TF-IDF and separately using sentence embeddings (sentence-transformers)
Trains classifiers: Logistic Regression, SVM, and a fine-tuned DistilBERT — compare all three
Evaluates using macro F1 and per-class precision/recall — flag any category with F1 below [X%]
Handles class imbalance if any category has fewer than [X] training examples
Builds a confidence-based routing rule: high-confidence predictions auto-route, low-confidence flag for human review
Saves the production pipeline and outputs predictions for all unlabeled tickets with confidence scores
Use Python (scikit-learn, transformers, sentence-transformers). Provide code and a model performance report.
Prompt 052
Geographic market expansion analysis
Business Analytics
prompt template
Act as a market expansion analyst. My company currently operates in [current markets]. I am evaluating expanding into [list of candidate markets / regions]. I have the following data available: [market size, population, competitor presence, regulatory complexity score, logistics cost index, historical performance in similar markets].
Build an expansion prioritization analysis that:
Defines and weights scoring criteria: [market size, competition, cost to serve, strategic fit, regulatory risk]
Normalizes all criteria to a 0–10 scale and calculates a weighted opportunity score per market
Builds a 2x2 prioritization matrix: opportunity score vs. ease of entry
Identifies the top 3 markets to enter first and the rationale for each
Estimates Year 1 revenue potential for the top market using bottom-up assumptions
Flags the biggest risk for each top market and proposes a mitigation
Produces a one-page market entry scorecard ready for executive review
Use Python or Excel. Provide the scoring model, the 2x2 chart, and the scorecard.
Prompt 053
Real-time dashboard with streaming data simulation
Reporting & BI
prompt template
Act as a BI engineer. I need to build a real-time monitoring dashboard for [use case: operations center / e-commerce live sales / server health / call center]. The data updates every [X seconds / minutes].
Build a real-time dashboard that:
Simulates or connects to a streaming data source producing [describe metrics: orders per minute, error rate, active sessions, etc.]
Displays a live updating line chart for the primary metric over a rolling [X minute] window
Shows current-value KPI cards with delta vs. [X minutes ago] and color-coded status (green / amber / red)
Triggers a visible alert when any metric crosses a [user-defined threshold]
Logs all threshold breaches to a running alert table with timestamp and severity
Includes a toggle to pause / resume the live feed for investigation
Is deployable as a local web app requiring no external database
Use Python (Dash / Streamlit / Panel) with threading or async data refresh. Provide full runnable code and a deployment guide.
Prompt 054
Pricing elasticity experiment design
Experimentation
prompt template
Act as a pricing experimentation expert. I want to run a controlled pricing experiment to measure the price elasticity of [product / service / subscription tier].
Design the experiment by:
Defining the price points to test: [anchor price, treatment prices — at least 3 levels]
Calculating the required sample size per price point to detect a [X%] change in conversion rate at 80% power and 95% confidence
Specifying the randomization unit: [user / session / geography / account] and justifying the choice
Identifying and controlling for confounders: day of week, device, acquisition channel, user tenure
Setting up a guardrail metric to detect any unintended harm (e.g., support ticket spike, refund rate increase)
Designing the holdout strategy and experiment duration
Producing a pre-analysis plan: primary metric, secondary metrics, statistical test, stopping rules, and decision criteria
Provide a complete experiment brief in structured format ready for engineering and legal review.
Prompt 055
Data catalog and metadata management setup
Data Governance
prompt template
Act as a data governance engineer. My organization has [X] datasets across [list of systems: data warehouse, data lake, operational databases, third-party feeds] with no centralized catalog or metadata documentation.
Build a data catalog foundation by:
Defining a metadata schema: dataset name, owner, source system, refresh frequency, row count, last updated, sensitivity classification, primary key, description
Writing a Python script to auto-inventory all tables in [database system] and populate the schema
Classifying all datasets by sensitivity: public, internal, confidential, restricted
Identifying datasets with no documented owner and flagging for assignment
Detecting datasets that haven't been updated in [X days] and marking as potentially stale
Building a searchable HTML or markdown catalog output from the inventory
Recommending a governance process: who reviews the catalog, how often, and what triggers an update
Provide Python code, a sample catalog output, and a governance process one-pager.
Prompt 056
Causal inference analysis
Statistics
prompt template
Act as a causal inference specialist. I want to estimate the causal effect of [intervention: a policy change / feature launch / marketing campaign / price change] on [outcome metric], using observational data (no randomization was possible).
Dataset: [X rows], columns include: [treatment indicator, outcome variable, pre-treatment covariates, time variable if panel data].
Perform a causal analysis using:
Propensity score matching or inverse probability weighting to balance treatment and control groups — report covariate balance before and after
Difference-in-differences if panel data is available — verify the parallel trends assumption
Instrumental variable (IV) estimation if a valid instrument exists: [describe candidate instrument]
A sensitivity analysis to assess how robust the estimate is to hidden confounding (Rosenbaum bounds)
Report the average treatment effect (ATE) and average treatment effect on the treated (ATT) with confidence intervals
Interpret the result in plain business language: what actually caused what?
Flag the key assumptions made and how likely they are to hold in this context
Use Python (causalinference, econml, linearmodels). Provide code and a causal evidence summary.
Prompt 057
Automated reporting pipeline with email delivery
Workflow Automation
prompt template
Act as a data engineer. I need an automated reporting pipeline that pulls data, generates a formatted report, and emails it to [stakeholders] every [daily / weekly / monthly] on [schedule].
Build the pipeline that:
Connects to [data source: PostgreSQL / BigQuery / CSV / API] and runs the required query or extraction
Calculates the KPIs: [list metrics with formulas]
Generates a formatted report in [PDF / Excel / HTML email] with charts, summary tables, and period-over-period comparisons
Highlights metrics that are above or below target with color coding
Writes a plain-English narrative summary section that auto-populates based on the data values
Sends the report via email using [SMTP / SendGrid / AWS SES] to a configurable recipient list
Logs each run with status (success / failure), record count, and timestamp — alerts on failure via [Slack / email]
Use Python (pandas, matplotlib, smtplib or sendgrid, schedule or cron). Provide full modular code, a config file template, and a deployment checklist.
Prompt 058
Benchmarking model performance across data slices
ML & Prediction
prompt template
Act as a responsible AI and ML evaluation specialist. I have a trained [classification / regression] model and a labeled test set with columns: [features, target, and demographic or segment fields like region, age_group, device_type, customer_tier].
Evaluate model fairness and slice performance by:
Calculating overall model performance: accuracy, F1, AUC, RMSE as appropriate
Breaking down performance by each segment dimension: compute the same metrics per slice
Identifying slices where performance falls below [X%] of overall performance — flag these as underperforming
Testing whether performance gaps between slices are statistically significant
Checking for disparate impact: does the model systematically over-predict or under-predict for any group?
Tracing underperforming slices back to training data: are they underrepresented or have noisier labels?
Recommending remediation: targeted data collection, slice-specific retraining, or post-processing calibration
Use Python (sklearn, slicefinder or manual pandas groupby). Provide code, a slice performance heatmap, and a model fairness summary memo.
Prompt 059
Customer journey mapping with data
Customer Analytics
prompt template
Act as a customer experience analyst. I have event-level behavioral data tracking customers across multiple touchpoints with columns: [customer_id, event_type, channel, timestamp, session_id, device_type, and any outcome flags like purchase or churn].
Map and analyze the customer journey by:
Reconstructing individual customer paths by sequencing events chronologically per customer
Identifying the top 10 most common journey sequences from first touch to conversion or churn
Calculating average time between each stage transition and flagging stages with the longest delays
Building a Sankey diagram visualizing the flow of customers across all major touchpoints
Segmenting journeys by outcome: compare paths of converted customers vs. churned customers to identify divergence points
Detecting the single touchpoint whose presence most strongly correlates with a positive outcome using lift analysis
Recommending 3 journey optimization interventions with estimated impact on conversion rate or time-to-value
Use Python (pandas, plotly). Provide code, the Sankey diagram, and a journey insight summary for the CX or product team.
Prompt 060
Dimensionality reduction and visualization
Advanced Analytics
prompt template
Act as a data scientist. I have a high-dimensional dataset with [X features] and [X rows] representing [customers / products / documents / sensor readings]. The data is difficult to interpret or visualize in its raw form.
Apply dimensionality reduction by:
Preprocessing all features: scale numerics, encode categoricals, handle missing values
Applying PCA and retaining components that explain at least [80% / 90%] of variance — plot the scree plot and cumulative explained variance
Applying t-SNE with perplexity values of [5, 30, 50] and comparing the resulting cluster structures
Applying UMAP with [min_dist = 0.1, n_neighbors = 15] as an additional comparison
Coloring all 2D projections by [a known label or segment field] to assess whether structure aligns with business categories
Identifying any clear clusters or outlier groups visible in the reduced space and describing their characteristics in original feature terms
Recommending which reduction method best preserves the structure relevant to [your downstream task: clustering / anomaly detection / visualization] and explaining why
Use Python (scikit-learn, umap-learn, plotly). Provide code and a side-by-side comparison of all three projections with interpretation.