Friday, March 31, 2023
No Result
View All Result
Get the latest A.I News on A.I. Pulses
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
No Result
View All Result
Get the latest A.I News on A.I. Pulses
No Result
View All Result

The best way to Examine Two Tables For Equality in BigQuery | by Giorgos Myrianthous | Jan, 2023

January 27, 2023
140 10
Home Data science
Share on FacebookShare on Twitter


Examine tables and extract their variations with customary SQL

Photograph by Zakaria Ahada on Unsplash

Evaluating tables in BigQuery is a vital job when testing the outcomes of information pipelines and queries previous to productionizing them. The flexibility to match tables permits for the detection of any modifications or discrepancies within the knowledge, guaranteeing that the info stays correct and constant.

On this article we’ll reveal how one can evaluate two (or extra) tables on BigQuery and extract the data that differ (if any). Extra particularly, we’ll showcase how one can evaluate tables with similar columns in addition to tables with a distinct quantity of columns.

First, let’s begin by creating two tables with some dummy values that we are going to then be referencing all through this tutorial with a view to reveal just a few totally different ideas.

— Create the primary tableCREATE TABLE `temp.tableA` (`first_name` STRING,`last_name` STRING,`is_active` BOOL,`no_of_purchases` INT)INSERT `temp.tableA` (first_name, last_name, is_active, no_of_purchases)VALUES (‘Bob’, ‘Anderson’, True, 12),(‘Maria’, ‘Brown’, False, 0),(‘Andrew’, ‘White’, True, 4)

— Create the second tableCREATE TABLE `temp.tableB` (`first_name` STRING,`last_name` STRING,`is_active` BOOL,`no_of_purchases` INT)INSERT `temp.tableB` (first_name, last_name, is_active, no_of_purchases)VALUES (‘Bob’, ‘Anderson’, True, 12),(‘Maria’, ‘Brown’, False, 0),(‘Andrew’, ‘White’, True, 6),(‘John’, ‘Down’, False, 0)

Evaluating data of tables with the identical columns

Now that we have now created our two instance tables, you need to have observed that there are a few variations between them.

SELECT * FROM `temp.tableA`;

+————+———–+———–+—————–+| first_name | last_name | is_active | no_of_purchases |+————+———–+———–+—————–+| Bob | Anderson | true | 12 || Andrew | White | true | 4 || Maria | Brown | false | 0 |+————+———–+———–+—————–+

SELECT * FROM `temp.tableB`;

+————+———–+———–+—————–+| first_name | last_name | is_active | no_of_purchases |+————+———–+———–+—————–+| Bob | Anderson | true | 12 || Andrew | White | true | 6 || Maria | Brown | false | 0 || John | Down | false | 0 |+————+———–+———–+—————–+

Now assuming that desk temp.tableB is the most recent model of some dataset whereas temp.tableA is an older one and we wish to see the precise variations (when it comes to data) between the 2 tables, all we want is the next question:

WITHtable_a AS (SELECT * FROM `temp.tableA`),table_b AS (SELECT * FROM `temp.tableB`),rows_mismatched AS (SELECT’tableA’ AS table_name,*FROM (SELECT*FROMtable_a EXCEPT DISTINCTSELECT*FROMtable_b )

UNION ALL

SELECT’tableB’ AS table_name,*FROM (SELECT*FROMtable_b EXCEPT DISTINCTSELECT*FROMtable_a ))

SELECT * FROM rows_mismatched

Now the end result will include all of the variations noticed between the tables together with a reference to the desk identify the place the data had been discovered.

In our particular examples, tables A and B had been having a distinction in two data; The primary one appears to be the document for Andrew White since this particular person has a distinct worth for no_of_purchases subject. Moreover, desk tableB has one extra document that’s not even current on desk tableA.

+————+————+———–+———–+—————–+| table_name | first_name | last_name | is_active | no_of_purchases |+————+————+———–+———–+—————–+| tableB | John | Down | false | 0 || tableB | Andrew | White | true | 6 || tableA | Andrew | White | true | 4 |+————+————+———–+———–+—————–+

Notice: In case you are not acquainted with the WITH clause and Frequent Desk Expressions (CTEs) in SQL, ensure that to learn the next article:

Evaluating data of tables with totally different columns

Now let’s suppose you wish to evaluate the data between two tables having a distinct quantity of columns. Clearly, we must do an apples-to-apples comparability which means that we in some way have to extract solely the widespread fields from each tables so as to have the ability to carry out a significant comparability.

Let’s re-create our tables with a view to generate some mis-matching columns in order that we are able to then reveal how one can cope with these circumstances:

— Create the primary tableCREATE TABLE `temp.tableA` (`first_name` STRING,`last_name` STRING,`is_active` BOOL,`dob` STRING)INSERT `temp.tableA` (first_name, last_name, is_active, dob)VALUES (‘Bob’, ‘Anderson’, True, ’12/02/1993′),(‘Maria’, ‘Brown’, False, ’10/05/2000′),(‘Andrew’, ‘White’, True, ’14/12/1997′)

— Create the second tableCREATE TABLE `temp.tableB` (`first_name` STRING,`last_name` STRING,`is_active` BOOL,`no_of_purchases` INT)INSERT `temp.tableB` (first_name, last_name, is_active, no_of_purchases)VALUES (‘Bob’, ‘Anderson’, True, 12),(‘Maria’, ‘Brown’, True, 0),(‘Andrew’, ‘White’, True, 6),(‘John’, ‘Down’, False, 0)

Now our new tables have solely three columns in widespread, particularly first_name, last_name and is_active.

SELECT * FROM `temp.tableA`;

+————+———–+———–+————–+| first_name | last_name | is_active | dob |+————+———–+———–+————–+| Bob | Anderson | true | ’12/02/1993′ || Andrew | White | true | ’10/05/2000′ || Maria | Brown | false | ’14/12/1997′ |+————+———–+———–+————–+

SELECT * FROM `temp.tableB`;

+————+———–+———–+—————–+| first_name | last_name | is_active | no_of_purchases |+————+———–+———–+—————–+| Bob | Anderson | true | 12 || Andrew | White | true | 6 || Maria | Brown | false | 0 || John | Down | false | 0 |+————+———–+———–+—————–+

Now if we try to run the question we executed within the earlier part the place the 2 tables had been having the identical columns, we’ll find yourself with this error:

Column 4 in EXCEPT DISTINCT has incompatible varieties: STRING, INT64 at [13:7]

That is completely regular provided that our tables not have matching columns. We have to barely amend our preliminary question such that the very first CTEs will solely choose the mutual columns for each desk. Our question will look as under:

WITHtable_a AS (SELECT first_name,last_name,is_activeFROM `temp.tableA`),table_b AS (SELECT first_name,last_name,is_active FROM `temp.tableB`),rows_mismatched AS (SELECT’tableA’ AS table_name,*FROM (SELECT*FROMtable_a EXCEPT DISTINCTSELECT*FROMtable_b )

UNION ALL

SELECT’tableB’ AS table_name,*FROM (SELECT*FROMtable_b EXCEPT DISTINCTSELECT*FROMtable_a ))

SELECT * FROM rows_mismatched

The tables created on this part had been having the next mismatches (when contemplating solely their mutual columns):

The document for Maria Brown has variations in column is_activeTable tableB has one extra document (John Down) which isn’t current in tableA

These variations may be noticed in question outcomes shared under:

+————+————+———–+———–+| table_name | first_name | last_name | is_active |+————+————+———–+———–+| tableB | Maria | Brown | false || tableB | John | Down | false | | tableA | Maria | Brown | true | +————+————+———–+———–+

Closing Ideas

On this article, we offered a complete information on how one can evaluate tables in BigQuery. We highlighted the significance of this job in guaranteeing the accuracy and consistency of information and demonstrated a number of methods for evaluating tables with similar columns in addition to tables with totally different quantities of columns. We additionally walked via the method of extracting data that differ between tables (if any).

Total, this text aimed to equip readers with the required instruments and information to successfully and effectively evaluate tables in BigQuery. I hope you discovered it helpful!

Change into a member and browse each story on Medium. Your membership payment instantly helps me and different writers you learn. You’ll additionally get full entry to each story on Medium.

Associated articles you may additionally like



Source link

Tags: BigQueryCompareEqualityGiorgosJanMyrianthousTables
Next Post

Cloud-Centric Firms Uncover Advantages & Pitfalls of Community Relocation

Researchers from MIT Suggest an AI Mannequin that Is aware of Methods to Generate Line Drawings from Pictures

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent News

How Has Synthetic Intelligence Helped App Growth?

March 31, 2023

Saying DataPerf’s 2023 challenges – Google AI Weblog

March 31, 2023

Saying PyCaret 3.0: Open-source, Low-code Machine Studying in Python

March 30, 2023

Anatomy of SQL Window Features. Again To Fundamentals | SQL fundamentals for… | by Iffat Malik Gore | Mar, 2023

March 30, 2023

The ethics of accountable innovation: Why transparency is essential

March 30, 2023

After Elon Musk’s AI Warning: AI Whisperers, Worry, Bing AI Adverts And Weapons

March 30, 2023

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
A.I. Pulses

Get The Latest A.I. News on A.I.Pulses.com.
Machine learning, Computer Vision, A.I. Startups, Robotics News and more.

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
No Result
View All Result

Recent News

  • How Has Synthetic Intelligence Helped App Growth?
  • Saying DataPerf’s 2023 challenges – Google AI Weblog
  • Saying PyCaret 3.0: Open-source, Low-code Machine Studying in Python
  • Home
  • DMCA
  • Disclaimer
  • Cookie Privacy Policy
  • Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In