DBMS Case Study Topics

blank

Today we at TopicSuggestions will keep it real and practical: we see DBMS technology quietly powering payments, playlists, hospital charts, and classrooms every second. We work with students who need more than definitions of ACID, indexes, and sharding, so we frame those tools around outcomes like latency, throughput, cost, and reliability. We believe the best case studies are concrete, measurable, and scoped, and we wrote this post to hand you topic ideas you can actually execute and analyze.

Case Study Topic Ideas on DBMS

We organized the list by domain (finance, health, retail, media), workload (OLTP, OLAP, streaming), and theme (scalability, security, consistency, physical design), and we note typical datasets, methods, and evaluation metrics for each. Today we will keep it concise: right after this intro you’ll find a curated list of DBMS case study topics you can pick, tailor, and present with confidence.

1. We mine consent provenance graphs for AI training data governance

We investigate how we can model and mine consent provenance graphs linking data subjects, collection events, licenses, and model training runs?
We ask whether we can infer missing consent edges via semi-supervised graph completion without over-claiming consent?
We explore how we can query such graphs to support revocation at scale and compute which downstream models and datasets we must unlearn?
We evaluate whether we can design privacy-preserving provenance mining that proves compliance without exposing identities?

2. We detect data scars by mining latent signatures of synthetic–real mixtures in large corpora

We hypothesize we can mine subtle generative “scars” that persist after post-processing—can we localize these signals at token, patch, or feature levels?
We test whether we can attribute segments to specific generators under heavy augmentation and compression?
We examine how we can separate benign synthetic augmentation from harmful contamination in downstream tasks?
We assess whether we can certify mixture proportions with uncertainty bounds that transfer across domains?

3. We build negotiation-aware recommenders by mining multi-user bargaining dynamics for consensus

We study how we can mine conversational bargaining traces to learn preference concessions, vetoes, and fairness constraints?
We ask whether we can predict negotiation outcomes and recommend compromise items that maximize group satisfaction under strategic behavior?
We analyze how we can detect manipulative tactics and re-balance recommendations to protect vulnerable participants?
We evaluate whether we can adapt mined bargaining patterns across cultures and contexts without degrading trust?

4. We design carbon-constrained pattern mining that co-optimizes discovery utility and energy footprint

We propose how we can integrate real-time grid carbon intensity and thermal limits into search heuristics for frequent pattern and subgraph mining?
We investigate whether we can jointly optimize interestingness and energy via multi-objective pruning with provable bounds?
We test if we can learn energy-aware scheduling policies that generalize across datasets, hardware, and data scales?
We assess how we can report carbon audit trails for each discovered pattern to inform repeatability and policy?

5. We discover emergent norms by mining causal interaction motifs in multi-agent simulations

We examine how we can mine temporal-causal motifs from agent interaction logs that predict the birth and breakdown of social norms?
We ask whether we can detect early-warning motifs that precede phase transitions in cooperation or polarization?
We probe how we can map mined motifs to interpretable institutional rules and validate them against naturalistic data?
We evaluate if we can transfer motif-based interventions from simulation to organizational settings ethically?

6. We track temporal privacy leakage by mining gradient trajectories in federated learning updates

We investigate how we can mine cross-round patterns in updates to identify re-identification and attribute inference risks for rare users?
We ask whether we can design trajectory obfuscation that disrupts leak signatures while preserving convergence?
We test if we can trigger targeted unlearning when mined leak indicators exceed risk thresholds?
We evaluate how we can verify leakage mitigation with formal guarantees under heterogeneous clients?

7. We forecast socio-technical drift by mining code–discussion–dependency multiplex networks in open-source

We explore how we can mine multiplex patterns linking commits, issues, reviews, and dependencies to anticipate governance drift and maintainer burnout?
We ask whether we can detect early signals of supply-chain fragility and coordinated vulnerability introduction?
We test if we can recommend community interventions based on mined patterns without amplifying inequities?
We evaluate how we can generalize drift predictors across ecosystems with minimal manual calibration?

8. We repair bias via counterfactual pipeline patching that mines minimal fairness-improving transformations

We study how we can mine minimal counterfactual edits to preprocessing, sampling, and labeling that improve fairness with bounded utility loss?
We ask whether we can attribute disparate impact to specific pipeline components via interventional pattern mining?
We test if we can auto-suggest auditable patches that satisfy legal and organizational constraints?
We evaluate how we can maintain repaired pipelines under data drift without fairness regression?

9. We mine explanation ensembles to construct robust consensus rationale graphs across models

We explore how we can mine diverse rationales from heterogeneous explainers to build consensus explanation graphs resilient to spurious signals?
We ask whether we can detect collusive rationale modes that correlate with overfitting and downweight them during training?
We test if we can use mined rationale diversity to guide active data acquisition and counterfactual labeling?
We evaluate how we can certify explanation stability across seeds, architectures, and distributions?

10. We infer annotator incentives by mining behavioral traces in human-in-the-loop labeling systems

We investigate how we can mine clickstreams, timing, and disagreement patterns to infer incentives, expertise, and fatigue in real time?
We ask whether we can design mechanism-aware workflows that align incentives and reduce shortcutting and adversarial behavior?
We test if we can personalize task routing and feedback using mined profiles to improve label quality and well-being?
We evaluate how we can audit incentive policies for fairness across annotator demographics and experience levels?

11. Adaptive Temporal Normalization for Event-Driven DBMS

We pose these research questions: 1) Can we define a normalization theory that adapts to event-time semantics and out-of-order arrival while preserving concise schemas? 2) Can we automatically select normalization levels per-event stream to minimize storage and query latency under time-travel semantics? 3) How does adaptive temporal normalization interact with incremental materialized views and late-arriving correction events?
We outline how to work on it: We will formalize temporal-normal forms, build cost models that account for event-time late arrivals and correction frequency, implement adaptive schema transforms in a prototype event-store (e.g., an extension of Apache Flink State or PostgreSQL with temporal features), and evaluate on synthetic and real event streams measuring storage, update cost, and query latency.

12. Quantum-Resistant Transaction Commit for Hybrid Classical–Quantum DBMS

We pose these research questions: 1) Can we design transaction commit and authentication protocols that remain secure when adversaries have quantum capabilities but database nodes are classical or hybrid quantum-classical? 2) How can we minimize performance overhead of post-quantum cryptography in two-phase and multi-phase commit? 3) Can we provide provable liveness and safety under a hybrid adversary model?
We outline how to work on it: We will model threat scenarios with quantum-capable adversaries, adapt post-quantum signatures/key-exchange to commit protocols, analyze formal safety/liveness using cryptographic proofs and TLA+/model checking, prototype in a distributed DBMS (e.g., CockroachDB patch), and benchmark cryptographic/latency overhead.

13. Energy-Proportional Query Planning with Per-Operator Voltage/Frequency Scaling

We pose these research questions: 1) Can we build a query planner that jointly optimizes energy consumption (Joules) and latency by exploiting per-operator voltage/frequency scaling (DVFS)? 2) Can we provide SLA-aware policies that trade energy for bounded latency degradation? 3) How do different storage/media tiers and operator implementations affect the energy-latency Pareto frontier?
We outline how to work on it: We will instrument a DBMS operator stack (e.g., PostgreSQL or a research engine) to control DVFS per CPU core, create cost/energy models per operator, implement an energy-aware planner that solves constrained optimization, and evaluate on server-grade hardware with power measurement (RAPL) and real workloads.

14. Provenance-Aware Real-Time GDPR Deletion in Streaming DBMS

We pose these research questions: 1) Can we enforce selective, real-time “right-to-be-forgotten” deletions over continuous queries without stopping streams or recomputing entire windows? 2) Can we compress provenance metadata so that deletion targets are located and purged with bounded delay and memory? 3) What are semantic guarantees (consistency, correctness) after selective deletion in incremental aggregates and joins?
We outline how to work on it: We will design provenance tagging schemes optimized for streams, create index structures to locate and invalidate contributions efficiently, model semantics for deletion-corrected aggregates, implement in a streaming engine (e.g., Flink/Materialize prototype), and measure deletion latency, memory overhead, and correctness.

15. ML-Guided Physical Schema Evolution with Provable Bounds on Query Performance Drift

We pose these research questions: 1) Can we use online learning to recommend physical schema changes (indexes, partitions) while providing provable bounds on worst-case query performance drift during adaptation? 2) Can we design exploration-exploitation strategies that safely adapt under workload drift? 3) How does bounded-risk evolution compare to human DBA policies?
We outline how to work on it: We will formalize performance drift risk, instantiate bandit/online-optimization algorithms with safety constraints, build a recommender integrated with a DBMS that applies changes in controlled fashion (rolling changes, canaries), and evaluate with workload traces, measuring regret, performance stability, and adaptation speed.

16. Emotion-Aware Indexing and Querying for Human–Computer Interaction Logs

We pose these research questions: 1) Can we design index structures that natively support retrieval over inferred emotional states in HCI logs (e.g., keystroke, gaze, biosignals) while preserving query performance? 2) Can we quantify and control bias and uncertainty introduced by emotion inference in retrieval results? 3) How do emotion-aware indexes affect downstream analytics and UX research reproducibility?
We outline how to work on it: We will collect or use annotated HCI datasets, design multi-dimensional index keys that combine temporal, spatial, and probabilistic emotion labels, implement uncertainty-aware scoring and ranking, perform user studies to evaluate retrieval utility, and analyze bias/variance introduced by inference models.

Drop your assignment info and we’ll craft some dope topics just for you.

It’s FREE 😉

17. Blockchain-Native Optimistic Concurrency for Cross-Shard Analytical Queries

We pose these research questions: 1) Can we create optimistic concurrency and snapshot mechanisms tailored to sharded ledgers to support low-latency cross-shard analytics without global coordination? 2) Can we bound staleness and provide mechanisms for safe speculative results with verifiable rollback? 3) How do incentive and adversarial models in permissionless ledgers affect correctness?
We outline how to work on it: We will design optimistic snapshot protocols that piggyback on ledger consensus metadata, formalize staleness/rollback semantics, implement prototypes on a sharded smart-contract platform or a permissioned blockchain testbed, and evaluate latency, throughput, and robustness under adversarial shard behavior.

18. Explainable Approximate Join Operators for ML Pipelines with Bounded Interpretability Loss

We pose these research questions: 1) Can we design approximate join operators that provide both bounded error on join output and an explainability artifact that quantifies why specific tuples were approximated or omitted? 2) Can we integrate such operators into ML pipelines while preserving model interpretability and downstream performance guarantees? 3) How does explainability-aware approximation affect debugging and fairness audits?
We outline how to work on it: We will develop approximate join algorithms that produce compact provenance/explanation sketches, formalize interpretability loss metrics, integrate operators into an ML feature-engineering workflow, and evaluate effects on model accuracy, explanation fidelity, and human-in-the-loop debugging tasks.

19. Fuzzy Temporal Integrity Constraints for Noisy IoT Sensor Streams with Probabilistic Provenance

We pose these research questions: 1) Can we define and enforce fuzzy temporal integrity constraints (e.g., “usually within 5s”) for noisy IoT streams while propagating probabilistic provenance? 2) Can we perform efficient incremental repair and confidence propagation under constrained resource budgets? 3) How do fuzzy constraints improve downstream decision-making compared with hard constraints?
We outline how to work on it: We will formalize fuzzy temporal constraints and confidence semantics, build lightweight probabilistic provenance annotations tailored for constrained devices, design incremental repair algorithms that trade CPU/memory for confidence, implement on a stream-processing stack, and evaluate on IoT benchmarks and decision tasks.

20. Schema-Surfacing and Semantic-Drift Detection for LLM-Augmented OLTP Applications

We pose these research questions: 1) Can we automatically surface implicit schema extensions created by LLMs (e.g., newly synthesized attributes or denormalized fields) and detect semantic drift over time? 2) Can we design corrective actions (schema migration, constraint synthesis) that safely reconcile LLM-induced mutations with ACID semantics? 3) How do these techniques affect application correctness, auditability, and developer trust?
We outline how to work on it: We will instrument LLM-augmented application paths, extract and cluster emergent schema fragments, design drift detectors using semantic embeddings and constraint change detection, propose safe reconciliation strategies (shadow migrations, validation gates), and validate with case studies and developer user studies.

21. Temporal-blockchain-integrated DBMS for IoT provenance

We propose a DBMS design that fuses immutable blockchain-backed ledgers with temporal relational engines to record fine-grained IoT provenance.
We ask: How can we encode high-frequency temporal IoT events into a compact on-chain digest while preserving queryable temporal semantics?; We ask: How do we design hybrid query planners that transparently span on-chain immutable segments and off-chain mutable temporal tables?; We ask: What consistency models best balance provenance auditability and low-latency IoT ingestion?
We will prototype by instrumenting an existing temporal DBMS to emit cryptographic digests per time window, integrate a permissioned blockchain for digest anchoring, and evaluate ingestion throughput, temporal query latency, and forensic audit costs on realistic IoT traces.

22. Adaptive energy-aware query caching for heterogeneous cloud DBMS

We design query caching policies that minimize energy-per-query across CPU/GPU/FPGA nodes in heterogeneous cloud DBMS deployments.
We ask: How do we model per-query energy cost across heterogeneous accelerators to inform caching decisions?; We ask: Can we learn dynamic cache admission policies that trade energy for latency under SLAs?; We ask: What are the system-level gains when coordinating cache placement with workload consolidation?
We will collect energy and latency profiles for representative queries on accelerators, build a reinforcement-learning cache controller in a DBMS prototype, and run cloud-scale experiments measuring energy, latency, and SLA violations.

23. Cross-modal schema evolution for multimodal ML datasets in DBMSs

We study schema evolution mechanisms that maintain consistency and queryability for multimodal datasets (text, image, audio) used in machine-learning pipelines.
We ask: How can we version and migrate multimodal schemas without invalidating models that depend on implicit feature encodings?; We ask: What metadata and transformation contracts are necessary to guarantee backward-compatible feature extraction?; We ask: How do we automatically generate migration plans that minimize retraining cost?
We will extend a DBMS metadata layer to track modality-specific schema versions, implement contract-based extractors, and evaluate on evolving datasets with model retraining cost and query correctness metrics.

24. Privacy-preserving adaptive indexing for encrypted hybrid workloads

We investigate adaptive indexing strategies that operate over encrypted data to accelerate mixed OLTP/analytical workloads while preserving differential privacy guarantees.
We ask: How can we build indexes whose creation and maintenance leak minimal statistical information under formal privacy budgets?; We ask: Can we design workload-aware encrypted indexes that adapt online without violating encryption or DP constraints?; We ask: What performance/utility trade-offs arise across encryption schemes (e.g., OPE, SSE) and DP mechanisms?
We will design index structures that incorporate randomized sketches under DP, prototype them over encrypted storage engines, and measure query latency, index size, and privacy-utility trade-offs on hybrid benchmarks.

25. Self-tuning materialized view prioritization in serverless DBMS environments

We create self-tuning algorithms that prioritize materialized view maintenance in ephemeral serverless DBMSs where resource availability and cost vary dynamically.
We ask: How do we schedule view maintenance under transient compute constraints to minimize cost while meeting freshness objectives?; We ask: Can we predict and pre-provision lightweight maintenance to exploit cold-start windows?; We ask: What incentives should multi-tenant workloads expose to share maintenance cost?
We will simulate serverless resource fluctuations, implement cost-aware maintenance policies using online learning, and evaluate monetary cost, staleness, and throughput on mixed workloads.

26. Explainable anomaly detection for query optimizer plan regressions

We focus on interpretable methods to detect and explain optimizer plan regressions that lead to sudden performance degradations.
We ask: What features derived from optimizer internal state best predict harmful plan regressions?; We ask: How do we produce human-readable explanations that map optimizer decisions to observable performance metrics?; We ask: Can we automate rollback or advisor recommendations grounded in the explanations?
We will collect optimizer traces and performance labels from controlled plan-change experiments, train interpretable models (rule lists, sparse trees), integrate explanation generation into the optimizer UI, and validate with DBAs on real incidents.

27. Fine-grained provenance-aware garbage collection for multi-tenant object-relational DBMS

We propose provenance-tracking GC that reclaims storage safely in multi-tenant object-relational DBMSs while preserving lineage required by tenants’ audits.
We ask: How can we maintain compact provenance summaries that allow per-tenant safe reclamation decisions?; We ask: What policies reconcile tenant retention policies, legal holds, and storage pressure?; We ask: How do we implement incremental GC with bounded overhead?
We will design lineage compression schemes, implement tenant-aware GC policies in an ORDBMS prototype, and measure reclaimable space, overhead, and correctness under mixed retention policies.

28. Distributed weak-consistency analytics under schema heterogeneity

We study analytics algorithms that tolerate weak consistency (causal/PRAM) and schema heterogeneity across geo-distributed DBMS replicas to enable low-latency global analytics.
We ask: How can we express analytics queries that are robust to schema divergence and produce bounded-error approximations?; We ask: What synchronization primitives yield best latency/accuracy trade-offs for cross-site joins and aggregations?; We ask: Can we synthesize reconciliation strategies that automatically reconcile schema variants for analytics?
We will formalize error bounds for approximate analytics under weak consistency, implement reconciliation layers that map variant schemas, and evaluate latency and approximation quality across geo-replicated datasets.

29. Quantum-inspired transaction scheduling heuristics for hybrid classical-quantum DBMS

We explore transaction scheduling heuristics inspired by quantum annealing principles to optimize lock contention and latency in hybrid systems where some processing offloads to NISQ devices.
We ask: Can quantum-inspired annealing heuristics reduce global contention and make preemption decisions beneficial for hybrid query pipelines?; We ask: How do we model cost surfaces combining classical locking and probabilistic quantum job runtimes?; We ask: What workload patterns benefit most from such heuristics?
We will develop annealing-based schedulers in a simulator that models classical locks and probabilistic quantum task durations, compare against classical schedulers, and identify crossover points where quantum-inspired methods improve throughput and latency.

30. Latent-space indexing for high-velocity unstructured sensor streams in time-series DBMS

We design indexing techniques that map high-velocity unstructured sensor data (e.g., images, audio) into compact latent spaces to support low-latency similarity and temporal queries in time-series DBMSs.
We ask: How do we maintain and update latent-space indexes under continuous concept drift with bounded memory?; We ask: What co-indexing strategies combine latent descriptors with classic time-partitioning for hybrid queries?; We ask: How does index freshness affect query accuracy for downstream anomaly detection?
We will integrate streaming representation learners (incremental embeddings) with approximate nearest-neighbor indexing in a TSDB prototype, evaluate ingestion cost, query latency, and detection accuracy under drift scenarios.

31. We propose “Adaptive privacy-budget allocation for multi-tenant DBMS with heterogeneous workloads”.

We ask research questions: 1) How do we allocate differential-privacy budgets across tenants to maximize aggregate utility while keeping fairness constraints? 2) How does workload heterogeneity (OLTP vs OLAP) change optimal budget schedules? 3) Can we design incentive-compatible budgets for tenants that report workloads?
We will build a simulator and instrument a DBMS (e.g., PostgreSQL) to replay mixed-tenant traces, implement DP mechanisms with an RL-based allocator, and evaluate utility, latency, and fairness under realistic tenant mixes.

32. We propose “Transactional scheduling for energy-aware heterogeneous storage tiers in DBMS”.

We ask research questions: 1) How to schedule transactions and place data across NVMe, SSD, and HDD tiers to minimize energy with SLA constraints? 2) What are the tradeoffs between energy, tail latency, and write amplification? 3) Can online policies adapt to diurnal and workload bursts?
We will model tier power and performance, extend a DBMS scheduler to support energy-aware placement policies, run experiments on emulated tiered storage with representative OLTP workloads, and measure energy savings versus performance loss.

33. We propose “Schema-evolution-aware materialized-view maintenance using semantic differencing”.

We ask research questions: 1) How to detect semantic changes in base schemas that affect materialized views beyond syntactic diffs? 2) Can we compute minimal incremental maintenance plans under schema evolution? 3) How to minimize downtime and recomputation during migrations?
We will design a semantic-diff engine that reasons about constraints and mappings, integrate with a view maintenance planner to produce incremental rewrite plans, and validate on longitudinal schema-migration logs and benchmarks.

34. We propose “Explainable anomaly diagnosis for DBMS performance regressions using causality graphs”.

We ask research questions: 1) How to automatically construct causal graphs from DBMS telemetry to explain regressions? 2) How to produce succinct, human-readable remediation suggestions with confidence scores? 3) Can such explanations be used to auto-trigger safe mitigations?
We will collect system and query-level telemetry, apply causal-discovery algorithms tuned for DBMS signals, generate natural-language explanations and ranked fixes, and evaluate developer interpretability and remediation success in A/B tests.

35. We propose “Blockchain-anchored provenance for GDPR-compliant data erasure in distributed DBMS”.

We ask research questions: 1) How to reconcile immutable ledger anchoring with the right-to-be-forgotten across replicas and caches? 2) Can we prove erasure to auditors without exposing deleted data? 3) What are the performance and storage tradeoffs of hybrid anchoring designs?
We will design a hybrid provenance protocol that stores encrypted payloads off-chain with on-chain commitments and zero-knowledge erasure proofs, implement a prototype across geo-replicated nodes, and measure auditability, compliance guarantees, and overhead.

36. We propose “Neural query planner that adapts to user intent drift”.

We ask research questions: 1) How to detect shifts in user intent from query embeddings and feedback signals? 2) How to adapt a learned planner online without catastrophic forgetting? 3) What stability guarantees are required for production adoption?
We will train a transformer-based planner on historical plan traces and feedback, implement drift detectors and continual-learning updates, deploy in a controlled DBMS proxy, and compare latency and plan quality under synthetic and real intent-drift scenarios.

37. We propose “Auto-tiering for HTAP using workload-aware micro-partitioning”.

We ask research questions: 1) What micro-partition granularity optimally balances OLTP latency and OLAP throughput in HTAP systems? 2) How to trigger migrations and maintain transactional consistency across tiers? 3) How to minimize interference between analytical scans and transactional hot partitions?
We will implement a micro-partitioner that profiles access heat and analytic scan locality, build migration triggers with lightweight consistency protocols, and evaluate on HTAP benchmarks measuring end-to-end latency, throughput, and migration overhead.

38. We propose “Formal verification of eventual-consistency guarantees in geo-replicated DBMS with client-side caching”.

We ask research questions: 1) How to formally model client-side caches, replica propagation, and staleness bounds? 2) Can we verify application-level invariants under realistic network partitions? 3) How to extract counterexamples that guide runtime mitigations?
We will create a TLA+/PlusCal model of geo-replication and caching layers, perform model checking for staleness and invariant violations, and validate model predictions with trace-driven simulations and targeted stress tests.

39. We propose “Privacy-preserving collaborative data cleaning across organizations using secure multiparty computation in DBMS pipelines”.

We ask research questions: 1) How to perform deduplication and record linkage without revealing raw records across parties? 2) What MPC primitives and optimizations best fit common cleaning operations like canonicalization and fuzzy matching? 3) How to integrate MPC tasks into ETL pipelines with acceptable latency?
We will design MPC protocols tailored to cleaning primitives (approximate joins, record linkage), implement them as DBMS pipeline operators with crypto optimizations (OT batching, PSI), and benchmark accuracy, privacy, and throughput on multi-party datasets.

40. We propose “Latency-tail prediction and tail-aware transaction routing using per-query micro-SLA profiling”.

We ask research questions: 1) How to build per-query micro-SLA profiles that predict tail latency under contention and resource shifts? 2) Can we route transactions to replicas or specialized nodes to meet micro-SLAs with minimal cost? 3) How to manage policies when micro-SLAs conflict?
We will collect fine-grained latency distributions per query shape, train quantile-regression models for tail prediction, implement a routing layer that uses cost-aware optimization to satisfy micro-SLAs, and evaluate on multi-replica clusters under varying load patterns.

41. Energy-aware Adaptive Indexing for In-memory DBMS

Topic: We investigate adaptive indexing strategies that minimize energy consumption in large in-memory DBMS while preserving query latency guarantees.
Research questions: We ask (1) How much energy can adaptive index craft strategies save compared with static indexing under realistic workloads? (2) How do we model energy vs. latency trade-offs at index-structure granularity? (3) How can we design controllers that adapt index maintenance frequency based on energy budgets?
Overview of how to work on it: We build controlled in-memory DBMS prototypes (or extend an open-source engine) with hooks to measure CPU, DRAM, and NVMe energy. We run mixed OLTP/OLAP workloads with simulator or real servers and implement several adaptive policies (e.g., incremental indexing, prioritized background reorganization). We evaluate Pareto frontiers of energy vs. latency and use reinforcement learning or model-predictive control to adapt indexing under energy constraints.

42. Self-explaining Anomaly Detection in Transaction Logs

Topic: We design DBMS-native anomaly detectors for transaction logs that produce human-understandable explanations tied to schema and business logic.
Research questions: We ask (1) What representations of transactions and schema enable diagnostic explanations? (2) How do we bridge statistical anomaly scores to causal explanations in terms of constraints, foreign keys, or business rules? (3) How can explanations be automatically prioritized for DBAs?
Overview of how to work on it: We extract feature-rich representations combining query text, parameter values, execution plans, and constraint violations. We train unsupervised models (e.g., contrastive, representation learning) and incorporate symbolic reasoning over schema metadata to generate hypotheses. We evaluate explanation fidelity and usefulness with user studies involving DBAs and measure detection precision/recall on synthetic and real anonymized logs.

43. Composable Incremental Recomputation for ML Model Serving inside DBMS

Topic: We explore incremental recomputation mechanisms within DBMS that maintain ML model outputs (predictions, features) under streaming updates without full retraining.
Research questions: We ask (1) What incremental algorithms map naturally to relational operators for common model families? (2) How do we compose incremental updates across feature pipelines, model retraining windows, and caching layers? (3) How to ensure bounded approximation error while minimizing recompute cost?
Overview of how to work on it: We formalize incremental propagation semantics for typical feature transforms and models (linear, tree ensembles, embeddings). We implement composable operators inside a DBMS or streaming SQL engine and benchmark on model-serving workloads with update traces. We design error-control knobs and policies (lazy vs. eager), then measure latency, throughput, and model accuracy decay.

44. Privacy-preserving Causal Inference from Observational DBMS Data

Topic: We develop methods for performing causal effect estimation over relational datasets inside a DBMS while providing differential privacy guarantees.
Research questions: We ask (1) How to design SQL-based or query-planner-aware transformations that implement private propensity-score matching or instrumental-variable estimators? (2) How to allocate privacy budget across multi-table joins, stratifications, and model selection? (3) How does privacy noise interact with causal identification assumptions?
Overview of how to work on it: We implement private primitives (noisy counts, private model fitting using DP-optimizers) as relational operators and extend query optimizers to plan privacy-budget-aware pipelines. We create synthetic and semi-synthetic relational causality benchmarks with known effects and evaluate bias, variance, and privacy-utility trade-offs, plus provide guidance for practitioners on budget allocation.

45. Adaptive Hybrid Consistency for Geo-distributed HTAP Systems

Topic: We propose adaptive consistency controllers that switch or tune consistency levels per transaction in geo-distributed Hybrid Transactional/Analytical Processing (HTAP) systems.
Research questions: We ask (1) How to automatically classify transactions and analytical queries by their tolerance to staleness and isolation anomalies? (2) What controllers can dynamically trade consistency, latency, and freshness based on workload and SLA signals? (3) How to verify correctness boundaries when mixing consistency modes?
Overview of how to work on it: We instrument an HTAP prototype with multi-level consistency primitives (causal, timeline, serializable via coordinated commits) and build classifiers using workload telemetry to tag requests. We design controllers (heuristic and RL-based) that assign consistency modes to meet SLAs. We evaluate across geo workloads, measure staleness, latency, throughput, and produce formal argumentation about anomaly exposure.

46. Cultural Ontology-Driven Schema Evolution in Multilingual Applications

Topic: We study schema evolution driven by cultural and linguistic ontologies to support semantic correctness when databases serve multilingual, multicultural applications.
Research questions: We ask (1) How can cultural ontologies inform schema refactorings that avoid semantic loss across locales? (2) How to automate migration scripts that preserve culturally-sensitive constraints and classifications? (3) How to detect evolution anti-patterns that produce biased or invalid data views for specific cultures?
Overview of how to work on it: We curate cultural ontologies and map them to schema metadata and domain constraints. We design evolution operators that embed ontology-aware transformations (e.g., canonicalization, extension, partitioning) and generate migration plans with validation checks. We validate on case studies (e-commerce, healthcare) across simulated locales and perform qualitative assessments with domain experts.

47. Quantum-friendly Query Planner Heuristics for Noisy Intermediate-Scale Quantum Accelerators

Topic: We research DBMS query planning heuristics that optimally integrate NISQ quantum accelerators for subroutines (e.g., joins, optimization, sampling) given quantum noise and limited qubit counts.
Research questions: We ask (1) Which relational primitives provide net throughput or accuracy benefits when offloaded to NISQ hardware? (2) How to cost quantum offload accounting for compilation, noise, and repetition? (3) How to plan hybrid CPU-quantum pipelines to maximize end-to-end quality under time budgets?
Overview of how to work on it: We model quantum subroutine costs and fidelity using NISQ profiles and extend a query optimizer with offload-aware cost models. We prototype hybrid executors that simulate or use cloud quantum backends for small tasks (e.g., amplitude estimation for cardinality, variational circuits for optimization). We compare purely classical, purely simulated quantum, and hybrid executions on representative analytics queries.

48. Differentially-Private Synthetic Schema and Data Generator for Benchmarking

Topic: We create a DBMS-aware generator that synthesizes relational schemas and data under differential privacy for use in benchmarks while preserving workload-relevant correlations.
Research questions: We ask (1) How to jointly privatize schema structure (keys, foreign relationships) and data distributions while maintaining realistic query costs? (2) How to quantify utility of synthetic workloads for optimizer tuning and benchmarking? (3) How to parameterize generators so benchmarks target specific performance aspects (e.g., join skew, nestedness) under privacy constraints?
Overview of how to work on it: We design hierarchical DP mechanisms for schema graphs and relational distributions, then implement a generator that outputs schema DDL and instance data. We evaluate utility by comparing optimizer plans, cardinality error profiles, and benchmark latencies between synthetic-private and original datasets, and study privacy-utility trade-offs across parameter settings.

49. Schema-aware, Policy-compliant Data Placement for Carbon-aware DBMS

Topic: We develop DBMS placement algorithms that assign data and shards across data centers to minimize carbon intensity while obeying schema locality, latency, and legal policies.
Research questions: We ask (1) How to model placement cost combining carbon footprint, query latency, and schema-driven co-location constraints? (2) How to incrementally re-place data safely under live workloads? (3) How to integrate renewable forecasts and spot prices into placement decisions?
Overview of how to work on it: We formalize multi-objective placement as constrained optimization with schema co-location constraints (e.g., foreign-key colocations). We build placement simulators fed by geo carbon traces and query workloads, and implement online rebalancers that use predictive forecasts. We measure carbon reduction, latency impact, and migration overheads on synthetic and real traces.

50. Explainable Repair Operators for Inconsistent Databases using Human Rules

Topic: We design repair operators that reconcile inconsistent databases by combining automated inference with human-provided soft rules and produce concise explanations of chosen repairs.
Research questions: We ask (1) How to represent human rules and preferences so automated repairs respect intent and are auditable? (2) How to rank multiple repair candidates and explain trade-offs in terms easily consumed by domain experts? (3) How to design interactive workflows for iterative repair and validation?
Overview of how to work on it: We represent human rules as weighted constraints and integrate them into optimization-based repair engines (ILP/MaxSAT) with cost models for data changes. We implement explanation generation that maps repairs back to violated rules and quantifies consequences. We evaluate on real inconsistency benchmarks and conduct user studies measuring trust, speed of correction, and acceptance of repairs.

Drop your assignment info and we’ll craft some dope topics just for you.

It’s FREE 😉

Leave a Comment

Your email address will not be published. Required fields are marked *

Maximize your IB success with a free consultation from expert tutors!

X