Skip to content
Tech Stuff, BigData & more

Primary Navigation Menu

Menu
  • Home
  • BigData
    • Spark
    • AWS
  • Tech Staff
    • Scrappy
    • Solr
  • Project Management

Parquet

Analyzing Redshift Spectrum ‘Patronus’

2017-11-19
By: Albert Franzi
On: 19th November 2017
In: AWS

Redshift Spectrum ‘Patronus’ This post aims to analyze how Redshift Spectrum works and how we can take advantage of using it. I will try to load data from S3 such as Sessions (Parquet) & Raw Data (JSON). First of all, we will follow the Getting started using spectrum guide. To use Redshift Spectrum, the cluster needs to be at version 1.0.1294 or later. We can validate that executing select version();

1
2
3
4
dwh_sch=# select version();
                                                         version
--------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.1499

To keep in mind – Redshift Pricing. Price Per Redshift Spectrum Query With Redshift Spectrum, you are billed at $5 per terabyte of data scanned, rounded up to the next megabyte, with a 10 megabyte minimum per query. For example, if you scan 10Read More →

AWS - Athena

Learning Session – AWS Athena

2017-03-22
By: Albert Franzi
On: 22nd March 2017
In: AWS, BigData

AWS – Athena What is Athena? Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service using standard SQL. With a few actions in the AWS Management Console, customers can point Athena at their data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds. Athena is serverless, so there is no infrastructure to set up or manage, and customers pay only for the queries they run. Athena scales automatically—executing queries in parallel—so results are fast, even with large datasets and complex queries. When should I useRead More →

Tableau fed by Presto with S3 Parquets

2016-10-05
By: Albert Franzi
On: 5th October 2016
In: AWS, BigData

Tableau + Presto & S3 Parquets Clickstream data is increasing fast, day by day, as more sites & features are being added. Our team provides new aggregated entities on top of that, meaning more data is produced. From Clickstream events we are able to produce Sessions, Visitors & Page Views objects that are stored in S3 with Parquet format. In order to stay up to date with new technologies, we wanted to do a PoC with Presto since some companies like Airbnb are using it instead of Redshift, our current solution. In this post, I’m going to explain my experience with Presto. First of all, we needRead More →

Recent Posts

  • Analyzing Redshift Spectrum ‘Patronus’
  • Creating a Redshift Sandbox for our Analysts
  • Learning Session – AWS Athena
  • Checking your Redshift users
  • Tableau fed by Presto with S3 Parquets

Categories

  • AWS
  • BigData
  • MapReduce
  • Project Management
  • Solr
  • Spark
  • Tech Staff
  • Uncategorised

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

© 2015 - 2019 Efimeres