Sunday, 20 December 2015

What is Encoding and Compression in Vertica?



Data Encoding and Compression:
        HP Vertica uses both encoding and compression to optimize query performance and save storage space. However, in your HP Vertica database, they have different meanings.

Encoding:
        Encoding is the process of converting data into a standard format. In HP Vertica, encoded data can be processed directly, but compressed data cannot. HP Vertica uses a number of different encoding strategies, depending on column data type, table cardinality, and sort order. Encoding increases performance because there is less disk I/O during query execution. In addition, you can store more data in less space. Encoding is not the same as compression.

Compression:
         Compression is the process of transforming data into a compact format. Compressed data cannot be directly processed; it must first be decompressed. HP Vertica uses integer packing for unencoded integers and LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in HP Vertica.


For More Info: Click Hiere

Saturday, 19 December 2015

What is Mergeout in Vertica?



    A mergeout is the process of consolidating ROS containers and purging deleted records. Over time, the number of ROS containers increases to a degree that it becomes necessary to merge some of them in order to avoid performance degradation. At that point, the Tuple Mover performs an automatic mergeout, which combines two or more ROS containers into a single container. This process can be thought of as "defragmenting" the ROS.


For More Info: Click Here

What is ROS in Vertica?



     Read Optimized Store (ROS) is a highly optimized, read-oriented, disk storage structure. The ROS makes heavy use of compression and indexing. You can use the COPY...DIRECT and INSERT (with /*+DIRECT*/ hints) statements to load data directly into the ROS.

What is WOS in Vertica?



    Write Optimized Store (WOS) is a memory-resident data structure for storing INSERT, UPDATE, DELETE, and COPY (without /*+DIRECT*/ hints) actions. To support very fast data load speeds, the WOS stores records without data compression or indexing. The WOS organizes data by epoch and holds both committed and uncommitted transaction data

What is Tuple Mover (TM) in Vertica?



    The Tuple Mover (TM) is the database optimizer component of Vertica that moves data from memory (WOS) to disk (ROS), combines small ROS containers into larger ones, and purges deleted data. The Tuple Mover runs in the background. Under ordinary circumstances, the operations performed by the Tuple Mover (TM) are automatic and transparent, and are therefore of little or no concern to the database administrator.


For More info: Click Here

How to monitor Copy Command?



You can check COPY loads using HP Vertica “functions” and “LOAD_STREAMS system table”.

HP Vertica Functions:

1. GET_NUM_ACCEPTED_ROWS : 
To get the number of accepted rows, use the GET_NUM_ACCEPTED_ROWS function

Query:
SELECT GET_NUM_ACCEPTED_ROWS()

2. GET_NUM_REJECTED_ROWS : 
To check the number of rejected rows, use the GET_NUM_REJECTED_ROWS function

Query:
SELECT GET_NUM_REJECTED_ROWS()

3.  Include the CURRENT_LOAD_SOURCE function as a part of the COPY statement:
When you include the CURRENT_LOAD_SOURCE function as a part of the COPY statement, the input file name or value computed from it, can be inserted into a column.

To insert the file names into a column from multiple source files:
COPY t (c1, c2, c3 as CURRENT_LOAD_SOURCE()) FROM '/home/load_file_1' ON  exampledb_node02,
'/home/load_file_2' ON exampledb_node03 DELIMITER ',';


LOAD_STREAMS System Table:
    HP Vertica includes a set of system tables that include monitoring information, as described in Using System Tables. The LOAD_STREAMS system table includes information about load stream metrics from COPY and COPY FROM VERTICA statements, so you can query table values to get COPY metrics.

To see all table columns:
SELECT * FROM LOAD_STREAMS;

For More Info: Click Here

What is Copy Command in Vertica?



     Bulk loads data from one or more files or pipes on a cluster host into a Vertica database. (See LCOPY to load from a data file on a client system using ODBC.)
COPY can load data in one of three formats:
  • Text with delimiters (the default format)
  • Native binary using the NATIVE keyword
  • Native varchar using the NATIVE VARCHAR keyword


For More Info: Click Here

Saturday, 28 November 2015

What ANALYZE_STATISTICS will do?



         ANALYZE_STATISTICS is a DDL operation that auto-commits the current transaction, if any. The ANALYZE_STATISTICS function reads a fixed, 10 percent of disk contents to aggregate sample data for statistical analysis. To obtain a larger (or smaller) data sampling, use the ANALYZE_HISTOGRAM function, which lets you specify the percent of disk to read. Analyzing more that 10 percent disk space takes proportionally longer to process but results in a higher level of sampling accuracy. ANALYZE_STATISTICS is supported on local temporary tables, but not on global temporary tables.


For more information: Click Here

Saturday, 21 November 2015

What is DBD in Vertica?



    The Database Designer is a tool that recommends a physical database design (projections) that provides the best performance to answer your query needs. This is useful because it minimizes the time the DBA spends on physical database tuning and provides the ability to re-design the database incrementally to optimize for changing workloads over time.
You can run the Database Designer before or after a Vertica database has been deployed, and it runs as a background process.
When you use the Database Designer to create a design, the DBA provides the following inputs:
·                     Logical schema (CREATE TABLE statements)
·                     Sample data
·                     A sample set of queries that represent what you'd normally run
·                     A K-safety level

For More Info:
https://my.vertica.com/docs/5.0/HTML/Master/14415.htm

What is Hash-Segmentation and Unsegmented?




Hash-Segmentation:

    Segments a projection evenly and distributes the data across nodes using a built-in hash function. Creating a projection with hash segmentation results in optimal query execution. HP Vertica recommends segmenting large tables.


Unsegmented:


    Automatically replicates the unsegmented projection on each node. To perform distributed query execution, HP Vertica requires an unsegmented copy of each small table superprojection on each node.

What is K-Safe in Vertica?



      Vertica uses the concept of K-Safety for failure recovery. The K value represents the maximum number of nodes in a database that can fail and recover with no loss of data. In Vertica, the value of K can be zero (0), one (1), or two (2). The Physical Schema design must meet certain requirements. To create designs that are K-Safe, Vertica recommends using the Database Designer.
To determine the number of nodes required to meet your K-Safety needs, use the formula: K<N/2
where :
·                     K is the K-Safety value.
·                     N is the number of nodes.
The value of K can be 1 or 2 only when the physical schema design meets certain redundancy requirements. See Physical Schema. To create designs that are K-Safe, Vertica recommends that you use the Description: https://my.vertica.com/docs/4.1/HTML/Master/arrowright.gifDatabase Designer.

By default, Vertica creates K-Safe superprojections when the database has a K-Safety greater than 0 (K>0). When creating projections with the Database Designer, projection definitions that meet K-Safe design requirements are recommended and marked with the K-Safety level. Note the output from running the optimized design script generated by the Database Designer in the following example:

Table and Projection are physical or logical?



Table:
    Tables do not occupy any physical storage at all in Vertica.

Projection:
    Physical storage consists of collections of table columns called projections.


Sunday, 15 November 2015

What is Projection?





Optimized collections of table columns that provide physical storage for data. A projection can contain some or all of the columns of one or more tables. A projection that contains all of the columns of a table is called a super-projection. A projection that joins one or more tables is called a pre-join projection.

Vertica Interview Questions



  1. What is WOS in Vertica?
  2. What is ROS in Vertica?
  3. What is Tuple Mover (TM) in Vertica? 
  4. What is Projection?
  5. Table and Projection are physical or logical?
  6. What is K-Safe in Vertica?
  7. What is Hash-Segmentation and Unsegmented?
  8. What is DBD in Vertica?
  9. What ANALYZE_STATISTICS will do?
  10. What is Copy Command in Vertica?
  11. Howto monitor Copy Command?
  12. What is Mergeout in Vertica?
  13. What is Encoding and Compression in Vertica?
  14. How to find currently running sessions?
  15. How to Close/Kill currently running session?


See also:

Friday, 29 May 2015

File Loading Issue using copy cmd in Vertica ERROR 2344:




       ERROR 2344: Cannot allocate sufficient memory for COPY statement (536870912 requested, 268435456 permitted)



   The above error occurred  while loading the file data into Vertica DB using copy command (even after putting rejected data file).


Solution:

  • First check the Source file type.
    • Open the file in "Notepad + +". In the bottom side we can find the file type (eg: Dos\Windows)
  
    •  Ideally the file type should be "Dos\Windows".
    • If the file type is different the we need to convert it.

File conversion using "Notepad + +":
  • Open the file in "Notepad + +".
  • Go to  "Edit" and select "EOL Conversion" then select the required format.
  • Save the file.

Wednesday, 8 April 2015

How to copy file data into Vertica database



   Using Copy command we can copy file data into Vertica database.

Syntax:
COPY <Table Name> FROM LOCAL '<File Path>' DIRECT DELIMITER '<Delimiter>' ENCLOSED BY '"';

Example:
COPY Sample_Table FROM LOCAL '/DataFiles/Sample.txt' DIRECT DELIMITER '|' ENCLOSED BY '"';