Sunday, 20 December 2015

What is Encoding and Compression in Vertica?



Data Encoding and Compression:
        HP Vertica uses both encoding and compression to optimize query performance and save storage space. However, in your HP Vertica database, they have different meanings.

Encoding:
        Encoding is the process of converting data into a standard format. In HP Vertica, encoded data can be processed directly, but compressed data cannot. HP Vertica uses a number of different encoding strategies, depending on column data type, table cardinality, and sort order. Encoding increases performance because there is less disk I/O during query execution. In addition, you can store more data in less space. Encoding is not the same as compression.

Compression:
         Compression is the process of transforming data into a compact format. Compressed data cannot be directly processed; it must first be decompressed. HP Vertica uses integer packing for unencoded integers and LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in HP Vertica.


For More Info: Click Hiere

Saturday, 19 December 2015

What is Mergeout in Vertica?



    A mergeout is the process of consolidating ROS containers and purging deleted records. Over time, the number of ROS containers increases to a degree that it becomes necessary to merge some of them in order to avoid performance degradation. At that point, the Tuple Mover performs an automatic mergeout, which combines two or more ROS containers into a single container. This process can be thought of as "defragmenting" the ROS.


For More Info: Click Here

What is ROS in Vertica?



     Read Optimized Store (ROS) is a highly optimized, read-oriented, disk storage structure. The ROS makes heavy use of compression and indexing. You can use the COPY...DIRECT and INSERT (with /*+DIRECT*/ hints) statements to load data directly into the ROS.

What is WOS in Vertica?



    Write Optimized Store (WOS) is a memory-resident data structure for storing INSERT, UPDATE, DELETE, and COPY (without /*+DIRECT*/ hints) actions. To support very fast data load speeds, the WOS stores records without data compression or indexing. The WOS organizes data by epoch and holds both committed and uncommitted transaction data

What is Tuple Mover (TM) in Vertica?



    The Tuple Mover (TM) is the database optimizer component of Vertica that moves data from memory (WOS) to disk (ROS), combines small ROS containers into larger ones, and purges deleted data. The Tuple Mover runs in the background. Under ordinary circumstances, the operations performed by the Tuple Mover (TM) are automatic and transparent, and are therefore of little or no concern to the database administrator.


For More info: Click Here

How to monitor Copy Command?



You can check COPY loads using HP Vertica “functions” and “LOAD_STREAMS system table”.

HP Vertica Functions:

1. GET_NUM_ACCEPTED_ROWS : 
To get the number of accepted rows, use the GET_NUM_ACCEPTED_ROWS function

Query:
SELECT GET_NUM_ACCEPTED_ROWS()

2. GET_NUM_REJECTED_ROWS : 
To check the number of rejected rows, use the GET_NUM_REJECTED_ROWS function

Query:
SELECT GET_NUM_REJECTED_ROWS()

3.  Include the CURRENT_LOAD_SOURCE function as a part of the COPY statement:
When you include the CURRENT_LOAD_SOURCE function as a part of the COPY statement, the input file name or value computed from it, can be inserted into a column.

To insert the file names into a column from multiple source files:
COPY t (c1, c2, c3 as CURRENT_LOAD_SOURCE()) FROM '/home/load_file_1' ON  exampledb_node02,
'/home/load_file_2' ON exampledb_node03 DELIMITER ',';


LOAD_STREAMS System Table:
    HP Vertica includes a set of system tables that include monitoring information, as described in Using System Tables. The LOAD_STREAMS system table includes information about load stream metrics from COPY and COPY FROM VERTICA statements, so you can query table values to get COPY metrics.

To see all table columns:
SELECT * FROM LOAD_STREAMS;

For More Info: Click Here

What is Copy Command in Vertica?



     Bulk loads data from one or more files or pipes on a cluster host into a Vertica database. (See LCOPY to load from a data file on a client system using ODBC.)
COPY can load data in one of three formats:
  • Text with delimiters (the default format)
  • Native binary using the NATIVE keyword
  • Native varchar using the NATIVE VARCHAR keyword


For More Info: Click Here