Hive 0.8.0 provides support for two virtual columns:
INPUT__FILE__NAME, which is the input file's name for a mapper task.
the other is
BLOCK__OFFSET__INSIDE__FILE, which is the current global file position.
For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset.
Since Hive 0.8.0 the following virtual columns have been added:
It is important to note, that all of the virtual columns listed here cannot be used for any other purpose (i.e. table creation with columns having a virtual column will fail with "SemanticException Error 10328: Invalid column name..")
BLOCK__OFFSET__INSIDE__FILE from src;
select key, count(
INPUT__FILE__NAME) from src group by key order by key;
select * from src where
BLOCK__OFFSET__INSIDE__FILE > 12000 order by key;