Tuesday, 4 December 2012

MySQL: Split a comma-separated list and insert result into table

Looking for a SPLIT-function in MySQL, I came across this one. I tried it and I must have have done something not right, because MySQL threw an error at the function. I am not a MySQL guru and since this is a one time Q&D conversion-action, I only took the SUBSTRING code and created a query with which one can split the contents of an old field into separate columns and directly insert the results into a new, normalized table.

My example is about a TEXT-column I want to get rid of and of which I want to transfer the contents to a separate table. This column contains email addresses separated by a comma. Thus, first, I had to find the maximum number of email addresses used in that column, so I found this query and added MAX() around it.

select max(length(emails) - length(replace(emails, ',', ''))) as occurrences
from old_table
where emails<>''


With that number, I created that number+1 of unions, so I would end up with all email addresses in one column. That select statement is then used in a left join to retrieve the corresponding user name and feed the results at the same time into a new table, which uses an ID and a USER-ID, instead of an email address:

insert into new_table
select idnr, user
from (
  select idnr,
  trim(substring(substring_index(emails, ',', 1), char_length(substring_index(emails, ',', 1 -1)) + 1)) as email
  from old_table
  where emails<>''

  union

  select idnr,
  trim(substring(substring_index(emails, ',', 2), char_length(substring_index(emails, ',', 2 -1)) + 2)) as email
  from old_table
  where emails<>''

  union

  select idnr,
  trim(substring(substring_index(emails, ',', 3), char_length(substring_index(emails, ',', 3 -1)) + 2)) as email
  from old_table
  where emails<>''

  union

  select idnr,
  trim(substring(substring_index(emails, ',', 4), char_length(substring_index(emails, ',', 4 -1)) + 2)) as email
  from old_table
  where emails<>''
) as x
join users u on (u.email1=x.email or u.email2=x.email)
where x.email<>''


Now that I have all used email address associated with the IDs of the original rows, I can now delete the old column and change all my LIKE-queries into LEFT JOINs. Much better, because email addresses change.