What is BigQuery?
- BigQuery is a fully managed big data tool for companies who need a cloud-based interactive query service for massive datasets.
- BigQuery is not a database, it’s a query service.
- BigQuery supports SQL queries, which makes it quite user-friendly. It can be accessed from Console, CLI, or using SDK. You can query billions of rows, it only takes seconds to write, and seconds to return.
- You can use its REST APIs and get your work done by sending a JSON request.
- Let’s understand with help of an example, Suppose you are a data analyst and you need to analyze tons of data. If you choose a tool like traditional MySQL, you need to have an infrastructure ready, that can store this huge data.
- You can focus on analysis rather than working on infrastructure. Hardware is completely abstracted.
- Designing this infrastructure itself will be a difficult task because you will have to figure out RAM size, CPU type, or any other configurations.
- BigQuery is mainly for Big Data. You shouldn’t confuse it with OLTP (Online Transaction Processing) database.
Terms related to BigQuery:
- Datasets: Datasets hold one or more tables of data.
- Tables: Tables are row-column structures that hold actual data
- Jobs: Operations that you perform on the data, such as loading data, running queries, or exporting data.
- This lab walks you through Cloud BigQuery.
- You will be creating a BigQuery Dataset and loading the CSV data.
- Login into GCP Console.
- Creating a BigQuery Dataset.
- Create a Table.
- Loading the data through an external CSV.
- Reading data through the Table using SQL Query.
Creating a BigQuery Dataset:
- Click on the hamburger icon on the top left corner
- Click on BigQuery under the BigData section.
3. Click on the project id listed in the sidebar.
4. Click on the three Dots and click Create Dataset.
5. Enter dataset id as whizlabs_bq_dataset.
6. Choose the Data location as United States(US).
7. Keep the given options as is.
8. Click on Create Dataset.
9. Click on the right arrow in the sidebar to expand the project.
10. You will be able to see the dataset. Click on the Dataset.
12. You can see + Icon to create a table and upload the data.
Uploading the Source file:
- Click on Create Table.
- Choose the option as Upload in Source.
- Click on Browse to choose the file from the local system. Click here to download the file which you can choose to upload.
- Enter the table name as user_details. Do not choose any other name. It is required for the validation of the lab.
- Keep the given options as is.
- Click on the Check box to auto-detect the schema.
7. Click on Create table.
8. Click on the “user_details” Table we have just created .
9. You will be able to see the table which you created.
10. You can see the schema as shown.
Querying the Data:
- Click on Compose new query.
2. Enter the query to fetch the data from table, change your project id.
select username from `<project_id>.whizlabs_bq_dataset.user_details` where age > 32
3. Click on Run to trigger the query.
4. You can see the output in the Query results section.
Completion and Conclusion:
- In this lab, you have created a BigQuery Dataset.
- You have created a Table and loaded data through an external CSV.
- You have read the data from the table using SQL Query.