数据关系分析工具

✪ About

Column Pruning

From a logical perspective, if a field has no substantial impact on the SQL execution results, it can be considered redundant; from a requirements standpoint, if a field included in the SQL is not used in the requirements, it is also considered redundant. These redundant fields not only increase the input/output burden during SQL execution but also increase the complexity of logical processing. Here, a SQL optimization tool has been developed to locate and optimize redundant fields in SQL.

✪ColumnPruning Video

✪ About

Data traceability

in the world of data, recording and recreating the original data throughout the entire lifecycle of data production, from its creation, dissemination, or demise, and the evolution and processing it undergoes, is what data lineage is all about. It can be simply understood as the reverse engineering of data production. Data lineage has a wide range of applications in data quality control, data governance, and other areas. Utilizing data provenance technology allows for the automatic collection and recording of lineage information during the data computation process.

✪Tracer Video

✪ About

DataLineage Visualization

The data production process is cumbersome and resource-intensive, and its complexity is exacerbated by the diversity and variability of the data. Frequent iterations of processing logic further increase the difficulty of ensuring data quality. To obtain data that is efficient, stable, and with controllable quality, the key is to clarify the origin of the data, track the processing steps it has undergone, and understand the references and dependencies between data. Summarizing this information constitutes the data's lineage relationship.

✪DataLineage Video

Column Pruning

Data traceability

DataLineage Visualization

✪ What can I do

SQLLineage Toolbox

DataLineage Visualization

Column Pruning

Data traceability

Spark SQL Tracer

✪ Contact me