Моя таблица lead
имеет индекс:
\d lead
...
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
...
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
Почему PG (9.1) не использует индекс для этого простого выбора равенства? Электронные письма почти все уникальны ....
db=> explain select * from lead where email = 'blah';
QUERY PLAN
------------------------------------------------------------
Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108)
Filter: (email = 'blah'::text)
(2 rows)
Другие запросы, попадающие в индекс, кажутся нормальными (хотя я не знаю, почему этот не просто использует индекс pkey):
db=> explain select * from lead where id = '';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using lead_id_prefix on lead (cost=0.00..8.57 rows=1 width=5108)
Index Cond: (id = ''::text)
(2 rows)
db=> explain select * from lead where account__c = '';
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using lead_account__c on lead (cost=0.00..201.05 rows=49 width=5108)
Index Cond: (account__c = ''::text)
(2 rows)
Сначала я подумал, что это может быть из-за недостаточного количества различных значений email
. Например, если в статистике указано, что email
равно blah
для большей части таблицы, то последовательное сканирование будет быстрее. Но это не так:
db=> select count(*), count(distinct email) from lead;
count | count
--------+--------
749148 | 733416
(1 row)
Даже если я принудительно отключу сканирование seq, планировщик будет вести себя так, как будто у него нет другого выбора:
db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
enable_seqscan
----------------
off
(1 row)
db=> explain select * from lead where email = '[email protected]';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319599.38 rows=1 width=5108)
Filter: (email = '[email protected]'::text)
(2 rows)
Также пробовал EXPLAIN ANALYZE
:
db=> explain analyze select * from lead where email = '[email protected]';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Seq Scan on lead (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
Filter: (email = '[email protected]'::text)
Total runtime: 77857.215 ms
(3 rows)
Вот результат \d
(извините, нужно скрыть имена столбцов и обрезать, чтобы соответствовать ограничениям SO; см. Необрезанную версию на http://pastebin.com/ve3gzJpY):
Table "lead"
Column | Type | Modifiers
--------------------------------------------+-----------------------------+-----------
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
email | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
account__c | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text |
id | text | not null
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real |
Indexes:
"lead_pkey" PRIMARY KEY, btree (id)
"lead_account__c" btree (account__c)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
"lead_email" btree (email)
"lead_id_prefix" btree (id text_pattern_ops)
Вот pg_dump --schema-only -t lead
(снова см. Несрезанные на http://pastebin.com/ve3gzJpY, с уникальными именами столбцов, а также в случае это помогает воспроизводимости):
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace:
--
CREATE TABLE lead (
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
account__c text,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
id text NOT NULL,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);
ALTER TABLE lead OWNER TO pod;
--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace:
--
ALTER TABLE ONLY lead
ADD CONSTRAINT lead_pkey PRIMARY KEY (id);
--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_account__c ON lead USING btree (account__c);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);
--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_email ON lead USING btree (email);
--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace:
--
CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);
--
-- PostgreSQL database dump complete
--
Некоторые заклинания каталога PG:
db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
indexrelid | indrelid | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
215251995 | 101034456 | 1 | f | f | f | t | f | t | t | t | 101 | 100 | 10043 | 0 | ¤ | ¤
(1 row)
Некоторая информация о локали:
db=> show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
db=> show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
(1 row)
Я просмотрел большое количество прошлых вопросов SO, но ни один из них не касался простого запроса на равенство, подобного этому.
text_pattern_ops
, поэтому это трудно объяснить. Можете ли вы воспроизвести это на небольшом образце? Если да, отправьте сообщение на sqlfiddle.com и сделайте ссылку здесь. - person Craig Ringer   schedule 12.04.2013pg_dump
). - person Peter Eisentraut   schedule 12.04.2013\d
иpg_dump
. - person Yang   schedule 12.04.2013lc_collate
/ etc и обновление имен столбцов в pastebin. - person Yang   schedule 13.04.2013