The All Roads Project – Ingesting the Bible

Introduction

It’s Easter Time! This is the perfect time to get back to the All Roads Project. For those new to this project, it is a combination of analytics and Religious Research specifically related Catechism of the Catholic Church (CCC). The idea is to be able to develop a database that contains the texts and the references.

These components can then be linked or networked together to show a lineage. Questions can be asked to determine the source of each teaching within the CCC. Like previously mentioned, these roads lead to God.

In this article, I will focus on the Bible texts. The steps will take us from data acquisition through to the data loading/ingestion process. The end result is the database will contain all text and references/citations from the Bible.

Data Acquisition

The previous time I talked about data acquisition for the Bible¹, I noted how I would use the source from the United States Conference of Catholic Bishops (USCCB). Unfortunately, I hit a roadblock when parsing out the references. Thankfully, the Vatican has a copy of the same New American Bible located on their website. This will actually be an easier effort since we used the Vatican’s website format for the CCC.

The acquisition of the text was done using the same “Low Code” method using Apple Shortcuts discussed in a previous article. It worked very well and produced a nice list of URLs that were scraped into text files.

Parsing the Bible Texts

As with the CCC, I used the awk language to parse out the headers, pages, and references. The code is in the appendix section below and it is very similar to what I did with the CCC. Here is the format for the output files (pipe delimited).

Headers File

File Name
Group Name
Book Name
Chapter Name
Page Line #
Header line #
Header Text

Pages File

File Name
Group Name
Book Name
Chapter Name
Page Line #
Paragraph line #
Paragraph Id
Text
References

References

File Name
Group Name
Book Name
Chapter Name
Page line number
Paragraph Id
Reference line number
Reference Id from text
References

With the text in a format that can be read into a database, let’s stage everything into tables.

Staging

Because the Bible data is formatted differently than the CCC, I decided to create separate staging tables. These tables are:

Bible_Pages_Stage_T
Bible_Header_Stage_T
Bible_Paragraphs_Stage_T
Bible_References_Stage_T

The following are the Source To Target Mappings (STTM) for each stage table:

STTM For Bible_Pages_Stage_T

Source File	Source Field Position	Transformation	Target Database	Target Table	Target Field
extracted_urls.vatican.Bible.all.txt	1	None	AllRoads	Bible_Pages_Stage_T	File_Name
extracted_urls.vatican.Bible.all.txt	2	None	AllRoads	Bible_Pages_Stage_T	Title
extracted_urls.vatican.Bible.all.txt	3	None	AllRoads	Bible_Pages_Stage_T	URL

STTM For Bible_Header_Stage_T

Source File	Source Field Position	Transformation	Target Database	Target Table	Target Field
Bible.all.h.txt	1	None	AllRoads	Bible_Header_Stage_T	File_Name
Bible.all.h.txt	2	None	AllRoads	Bible_Header_Stage_T	Group_Name
Bible.all.h.txt	3	None	AllRoads	Bible_Header_Stage_T	Book_Name
Bible.all.h.txt	4	None	AllRoads	Bible_Header_Stage_T	Chapter_Name
Bible.all.h.txt	5	None	AllRoads	Bible_Header_Stage_T	Page_Line_Number
Bible.all.h.txt	6	None	AllRoads	Bible_Header_Stage_T	Header_Line_Number
Bible.all.h.txt	7	None	AllRoads	Bible_Header_Stage_T	Header_Text

STTM for Bible_Paragraphs_Stage_T

Source File	Source Field Position	Transformation	Target Database	Target Table	Target Field
Bible.all.p.txt	1	None	AllRoads	Bible_Paragraphs_Stage_T	File_Name
Bible.all.p.txt	2	None	AllRoads	Bible_References_Stage_T	Group_Name
Bible.all.p.txt	3	None	AllRoads	Bible_References_Stage_T	Book_Name
Bible.all.p.txt	4	None	AllRoads	Bible_References_Stage_T	Chapter_Name
Bible.all.p.txt	5	None	AllRoads	Bible_Paragraphs_Stage_T	Page_Line_Number
Bible.all.p.txt	6	None	AllRoads	Bible_Paragraphs_Stage_T	Paragraph_Line_Number
Bible.all.p.txt	7	None	AllRoads	Bible_Paragraphs_Stage_T	Paragraph_Id
Bible.all.p.txt	8	None	AllRoads	Bible_Paragraphs_Stage_T	Paragraph_Text
Bible.all.p.txt	9	None	AllRoads	Bible_Paragraphs_Stage_T	References

STTM for Bible_References_Stage_T

Source File	Source Field Position	Transformation	Target Database	Target Table	Target Field
Bible.all.r.txt	1	None	AllRoads	Bible_References_Stage_T	File_Name
Bible.all.r.txt	2	None	AllRoads	Bible_References_Stage_T	Group_Name
Bible.all.r.txt	3	None	AllRoads	Bible_References_Stage_T	Book_Name
Bible.all.r.txt	4	None	AllRoads	Bible_References_Stage_T	Chapter_Name
Bible.all.r.txt	5	None	AllRoads	Bible_References_Stage_T	Page_Line_Number
Bible.all.r.txt	6	None	AllRoads	Bible_References_Stage_T	Paragraph_Id
Bible.all.r.txt	7	None	AllRoads	Bible_References_Stage_T	Reference_Line_Number
Bible.all.r.txt	8	None	AllRoads	Bible_References_Stage_T	Reference_Id
Bible.all.r.txt	9	None	AllRoads	Bible_References_Stage_T	Reference_Text

Remember to always check counts from local file to staging at this point. You can do that through your favorite tool of choice for the command line (or loading into an editor) then comparing that to counts using SQL.

Loading from Staging

With the data staged, we can write SQL to do the loads to the final table. Remember, the landing tables already include the CCC data. So, don’t truncate the tables if you hit an error. You will have to excise any erroneous records with DELETE statements if necessary. That said, you shouldn’t have to do that if you properly design and test your loads out of staging.

For design, remember the STTMs are essential! Always think and write out your process before building the load. Here are the STTMs for each table.

STTM for Texts_T

Source Table	Source Field	Data Type	Transformation	Target Table	Target Field	Date Type
			System Generated Id	Texts_T	Text_Id	integer
Bible_Header_Stage_T	Title		N/A	Texts_T	Title	character varying (1000)
Bible_Header_Stage_T	Title		N/A	Texts_T	Description	text
Bible_Pages_Stage_T	URL		N/A	Texts_T	URL	character varying (2048)
Bible_Header_Stage_T	Group_Name		N/A	Texts_T	Group	character varying (50)
			Set to “Bible”	Texts_T	Type	character varying (50)
			Set to CURRENT_DATE	Texts_T	Audit_Create_Date	timestamp
			Set to “Andy”	Texts_T	Audit_Create_User	character varying (100)
			N/A	Texts_T	Audit_Update_Date	timestamp
			N/A	Texts_T	Audit_Update_User	character varying (100)
*Note: JoinHeader and Pages stage tables by File_Name
Sort Group_Name according to the following rules: when ‘The Pentateuch’ then 1 when ‘The Historical Books’ then 2 when ‘The Wisdom Books’ then 3 when ‘The Prophetic Books’ then 4 when ‘The Gospels’ then 5 when ‘New Testament Letters’ then 6 when ‘The Catholic Letters’ then 7

STTM for Chapters_T

Source Table	Source Field	Data Type	Transformation	Target Table	Target Field	Date Type
			System Generated Id	Chapters_T	Chapter_Id	integer
Bible_Pages_Stage_T	URL			Chapters_T	URL	character varying (2048)
Bible_Header_Stage_T and Bible_Pages_Stage_T	Book_Name (from headers) and Title (from pages)		h.”Book_Name” \|\| ‘ ‘ \|\| pg.”Title”	Chapters_T	Title	character varying (1000)
Bible_Header_Stage_T	Book_Name and Chapter_Name		h.”Book_Name” \|\| ‘ – ‘ \|\| h.”Chapter_Name”	Chapters_T	Description	text
Texts_T	Text_id			Chapters_T	Text_Id	Integer
			Set to CURRENT_DATE	Chapters_T	Audit_Create_Date	timestamp
			Set to “Andy”	Chapters_T	Audit_Create_User	character varying (100)
			N/A	Chapters_T	Audit_Update_Date	timestamp
			N/A	Chapters_T	Audit_Update_User	character varying (100)
Join 1: header to pages by File_Name
Join 2: header to Texts_T by Group and Title

STTM for Pages_T

Source Table	Source Field	Data Type	Transformation	Target Table	Target Field	Date Type
			System Generated Id	Pages_T	Page_Id	integer
			Set to ROW_NUMBER() by sorting on the File_Name.	Pages_T	Page_Sequence_Id	integer
Bible_Header_Stage_T and Bible_Pages_Stage_T	TItle	character varying (1000)	COALESCE(h.”Book_Name” \|\| ‘ ‘ \|\| pg.”Title”, pg.”Title”)	Pages_T	Title	character varying (1000)
Bible_Pages_Stage_T and Chapters_T		text	COALESCE(c.”Description”, pg.”Title”)	Pages_T	Description	text
Bible_Pages_Stage_T	URL	character varying (2048)	N/A	Pages_T	URL	character varying (2048)
Bible_Pages_Stage_T	File_Name	character varying (1000)	N/A	Pages_T	File_Name	character varying (1000)
Texts_T	Text_Id	integer	Set to default value (-1) if null	Pages_T	Text_Id	integer
Chapters_T	Chapter_Id	integer	Set to default value (-1) if null	Pages_T	Chapter_Id	integer
			Set to CURRENT_DATE	Pages_T	Audit_Create_Date	timestamp
			Set to “Andy”	Pages_T	Audit_Create_User	character varying (100)
			N/A	Pages_T	Audit_Update_Date	timestamp
			N/A	Pages_T	Audit_Update_User	character varying (100)
Join 1: LEFT OUTER between header and pages according to File_Name
Join 2: LEFT OUTER between chapters and pages according to URL
Join 3: LEFT OUTER between chapters and texts according to Text_Id

STTM for References_T

Source Table	Source Field	Data Type	Transformation	Target Table	Target Field	Date Type
			System Generated Id	References_T	Reference_Id	integer
Bible_References_Stage_T	Reference_Id	text		References_T	Reference_Sequence_Id	character varying (1000)
Bible_References_Stage_T	Reference_Text	text	If no text, set to “No Reference Text”	References_T	Reference_Text	text
			Set to NULL	References_T	URL	character varying (2048)
Pages_T	Page_Id	integer	N/A	References_T	Page_Id	integer
Pages_T	Text_Id	integer	N/A	References_T	Text_Id	integer
			Set to CURRENT_DATE	References_T	Audit_Create_Date	timestamp
			Set to “Andy”	References_T	Audit_Create_User	character varying (100)
			N/A	References_T	Audit_Update_Date	timestamp
			N/A	References_T	Audit_Update_User	character varying (100)
Join: references to pages by File_Name
*Note: Sort by Text_id, Page_Id and Reference_Id in an ascending order

STTM for Paragraphs_T

Source Table	Source Field	Data Type	Transformation	Target Table	Target Field	Date Type
			System Generated Id	Paragraphs_T	Paragraph_Id	integer
Bible_Paragraphs_Stage_T	Paragraph_Id	bigint	When Paragraph_Id = ‘-1’ then set to NULL	Paragraphs_T	Paragraph_Sequence_Id	character varying (1000)
Bible_Paragraphs_Stage_T	Paragraph_Text	text		Paragraphs_T	Paragraph_Text	text
Pages_T	URL	character varying (2048)		Paragraphs_T	URL	character varying (2048)
Bible_Paragraphs_Stage_T	References	text	Convert comma delimited string to json using to_json	Paragraphs_T	Reference_Sequence_Ids	json
Pages_T	Text_Id	integer		Paragraphs_T	Text_Id	integer
Pages_T	Page_id	integer		Paragraphs_T	Page_Id	integer
			Set to CURRENT_DATE	Paragraphs_T	Audit_Create_Date	timestamp
			Set to “Andy”	Paragraphs_T	Audit_Create_User	character varying (100)
			N/A	Paragraphs_T	Audit_Update_Date	timestamp
			N/A	Paragraphs_T	Audit_Update_User	character varying (100)
Join: paragraphs to pages according to File_Name
*NOTE: Sort by Paragraph_Line_Number in an ascending order

We have now loaded the Bible into the landing tables. Again, please remember that you must run data checks at this point. Ensure that the counts between the staging tables and the landing tables match. There should be a 1:1 in terms of counts between them for the Bible. If you have discrepancies, then go back and double check the loading SQL for accuracy.

Next Steps

With the CCC and Bible data now loaded, we have a very good start for analysis. It is a great success at this point but much more work is needed. There will also need to be further data sources for future work.

In the next article, I will go through some introductory analysis of creating relationships between paragraphs and references. This will be challenging as it is across more than 1 text as opposed to looking up the citations/references within same text.

Happy Easter! He is Risen!

References

[1] God. The New American Bible. Distributed by Vatican. Retrieved from: https://www.vatican.va/archive/ENG0839/_INDEX.HTM

Appendix

ALL CODE IS CONSIDERED IN AS-IS CONDITION. USE AT YOUR OWN CAUTION.

awk Script

			
# Author: Andy Della Vecchia
# Purpose: Parse out Bible text into header, paragraphs, and reference files
BEGIN {
  # these are used to track where the read is in the file
  section_number = 0
  blank_line_count = 0
  subsection_number = 0
  page_line_number = 0
  paragraph_id = -1
  h_line_number = 0
  p_line_number = 0
  r_line_number = 0
  
  # get the file name being processed
  split(ARGV[1], file_pieces, ".")
  split(file_pieces[length(file_pieces)-2], file_number, "/")
  fn = "Bible." file_number[length(file_number)]
  # build out the individual file names (h=header; p=paragraph; r=references)
  out_path = "./out/"
  header_file = out_path fn ".h.txt"
  paragraph_file = out_path fn ".p.txt"
  reference_file = out_path fn ".r.txt"
}
{
  # track line number for debugging purposes
  page_line_number++
  # when the "Previous - Next" is found
  if ($0 ~ /Previous - Next/ || $0 ~ /IntraText - Text/) {
    # Track the section
    section_number++
   
  } else {
    # section 1 is the header section
    if (section_number == 1 && NF != 0) {
       # 1 - File Name
       # 2 - Group Name
       # 3 - Book Name
       # 4 - Chapter Name
       # 5 - Page Line # (sequential for each page)
       # 6 - Header line # (sequential for each header line #)
       # 7 - Header Text
       h_line_number++
       gsub(/[^a-zA-Z0-9 ]/, "")
       if (h_line_number == 1) {
           group_name = $0
       } else if (h_line_number == 2) {
           book_name = $0
       } else if (h_line_number == 3) {
           chapter_name = $0
       }
       print fn "|" group_name "|" book_name "|" chapter_name "|" page_line_number "|" h_line_number "|" $0 > header_file
    # section 2 is the main section with the paragraph and references
    } else if (section_number == 2) {
	# if not a blank line and numbers or text exist
	if (NF != 0 && ($0~/[0-9]+/ || $0~/[A-Za-z+]/)) {
#print $0 > paragraph_file
		# subsection 1 is the paragraphs
		if (subsection_number == 0 && !($0 ~ /Click here to show the links to concordance/) && !($0==group_name) && !($0==book_name) && !($0==chapter_name)) {
			# split the line
			# 1 - File Name
		        # 2 - Group Name
       			# 3 - Book Name
       			# 4 - Chapter Name
			# 5 - Page Line # (sequential for each page)
                        # 6 - Paragraph line # (sequential for each paragraph #)
			# 7 - Paragraph Id (if doesn't exist, -1)
			# 8 - Text (including reference #s)
			# 9 - References - comma delimited
			p_line_number++
			ref_nums = ""
			#if the entire line is a number only, save it as the paragraph line number
			if ($0 ~ /[0-9]+/ && !($0 ~/[A-Za-z+]/)) {
				paragraph_id = $0
			} else {
				# find any numeric values and pull out as reference Ids
				for (i=1; i<=NF; i++) {
	
				   if ($i ~ /[0-9]+/) {
		
					bkp_fld = $i
					gsub(/[^0-9+]/, "", $i)
	
					if (ref_nums != "") {
					   ref_nums = ref_nums "," $i
					} else {
					   ref_nums = $i
					}
	
					# make sure original field is restored
					#$i = bkp_fld
				   }
				}
				# add on the reference numbers and print out to file
				print fn "|" group_name "|" book_name "|" chapter_name "|" page_line_number "|" p_line_number "|" paragraph_id "|" $0 "|" ref_nums > paragraph_file
			}
		# subsection 2 is where the references are stored
		} else if (subsection_number >= 1) {
			# 1 - File Name
		        # 2 - Group Name
       			# 3 - Book Name
       			# 4 - Chapter Name
                        # 5 - Page line number
			# 6 - Paragraph Id
                        # 7 - reference line number
			# 8 - Reference Id from text
			# 9 - References
			r_line_number++
			$1 = $1 "|"
			print fn "|" group_name "|" book_name "|" chapter_name "|" page_line_number "|" p_line_number "|" paragraph_id "|" $0 > reference_file
            		}
		# remember that there are no blank lines
	        blank_line_count = 0
	# blank line condition
	} else {
#print "blank Line" > paragraph_file
		blank_line_count++
	        if (blank_line_count >= 4) {
		    subsection_number++
#print "subsection hit" subsection_number > paragraph_file
		    blank_line_count = 0
       	}
	}
   }
  }
}

		

DDL for Staging Tables

BEGIN;


CREATE TABLE IF NOT EXISTS "AllRoads"."Bible_Header_Stage_T"
(
    "File_Name" character varying(1000) NOT NULL,
    "Group_Name" text,
    "Book_Name" text,
    "Chapter_Name" text,
    "Page_Line_Number" bigint NOT NULL,
    "Header_Line_Number" bigint NOT NULL,
    "Header_Text" text NOT NULL
);

CREATE TABLE IF NOT EXISTS "AllRoads"."Bible_Paragraphs_Stage_T"
(
    "File_Name" character varying(1000) NOT NULL,
    "Group_Name" text,
    "Book_Name" text,
    "Chapter_Name" text,
    "Page_Line_Number" bigint NOT NULL,
    "Paragraph_Line_Number" bigint NOT NULL,
    "Paragraph_Id" bigint NOT NULL,
    "Paragraph_Text" text NOT NULL,
    "References" text
);

CREATE TABLE IF NOT EXISTS "AllRoads"."Bible_References_Stage_T"
(
    "Book_Name" text,
    "Group_Name" text,
    "File_Name" character varying(1000) NOT NULL,
    "Chapter_Name" text,
    "Page_Line_Number" bigint NOT NULL,
    "Reference_Line_Number" bigint NOT NULL,
    "Reference_Id" text NOT NULL,
    "Reference_Text" text NOT NULL
);

CREATE TABLE IF NOT EXISTS "AllRoads"."Bible_Pages_Stage_T"
(
    "File_Name" character varying(1000) NOT NULL,
    "Title" character varying(1000) NOT NULL,
    "URL" character varying(2048) NOT NULL
);
END;

SQL for Load

Texts_T

INSERT INTO "AllRoads"."Texts_T"
("Title"
 ,"Description"
 ,"URL"
 ,"Group"
 ,"Type"
 ,"Audit_Create_Date"
 ,"Audit_Create_User"
)
SELECT h."Book_Name" as "Title"
	   ,h."Book_Name" as "Description" --duplicate book name rather than put a null in it
	   ,pg."URL" as "URL"
	   ,h."Group_Name" as "Group"
	   ,'Bible' as "Type"
	   ,CURRENT_DATE as "Audit_Create_Date"
	   ,'Andy' as "Audit_Create_User"
FROM (
		SELECT *
		  FROM (
			SELECT distinct row_number() over (partition by "Group_Name", "Book_Name" order by cast(substr("File_Name",7,100) as numeric)) as rn,
					"File_Name", 
					"Group_Name", 
					"Book_Name"  
			FROM "AllRoads"."Bible_Header_Stage_T" h1
			WHERE "Book_Name" is not null
		) hdr
		where hdr.rn = 1
	 ) as h
     inner join
	 "AllRoads"."Bible_Pages_Stage_T" as pg on (h."File_Name" = 'Bible.' || pg."File_Name")-- and trim(h."Group_Name") = trim(p."Title"))
ORDER BY case h."Group_Name"
		     when 'The Pentateuch' then 1
			 when 'The Historical Books' then 2
			 when 'The Wisdom Books' then 3
			 when 'The Prophetic Books' then 4
			 when 'The Gospels' then 5
			 when 'New Testament Letters' then 6
			 when 'The Catholic Letters' then 7
		 end
		 ,cast(pg."File_Name" as numeric)
		 , pg."Title"
;

Paragraphs_T

INSERT INTO "AllRoads"."Paragraphs_T"
(
	"Paragraph_Sequence_Id"
	,"Paragraph_Text"
	,"URL"
	,"Reference_Sequence_Ids"
	,"Text_Id"
	,"Page_Id"
	,"Audit_Create_Date"
	,"Audit_Create_User"
)
SELECT case 
			when p."Paragraph_Id" = '-1' then null
			else p."Paragraph_Id"
		end as Paragraph_Sequence_Id
	   ,p."Paragraph_Text"
	   ,pg."URL"
	   ,to_json(string_to_array(p."References",',')) as Reference_Sequence_Ids
	   ,pg."Text_Id"
	   ,pg."Page_Id"
	   ,CURRENT_DATE AS Audit_Create_Date
	   ,'Andy' as Audit_Create_User
  FROM "AllRoads"."Bible_Paragraphs_Stage_T" as p
       INNER JOIN
	   "AllRoads"."Pages_T" as pg on (p."File_Name" = pg."File_Name")
ORDER BY pg."Page_Id", p."Paragraph_Line_Number"
;

Pages_T

INSERT INTO "AllRoads"."Pages_T"
(
	"Page_Sequence_Id"
	,"Title"
	,"Description"
	,"URL"
	,"File_Name"
	,"Text_Id"
	,"Chapter_Id"
	,"Audit_Create_Date"
	,"Audit_Create_User"
)
SELECT ROW_NUMBER() OVER(PARTITION BY pg1."Text_Id" ORDER BY CAST(pg1."File_Name" as INTEGER)) AS Page_Sequence_Id
	   ,pg1."Title"
	   ,pg1."Description"
	   ,pg1."URL"
	   ,pg1."Bible_File_Name"
	   ,pg1."Text_Id"
	   ,pg1."Chapter_Id"
	   ,pg1."Audit_Create_Date"
	   ,pg1."Audit_Create_User"
  FROM (

	SELECT COALESCE(h."Book_Name" || ' ' || pg."Title", pg."Title") as "Title"
		   ,COALESCE(c."Description", pg."Title") as "Description"
		   ,pg."URL"
	  	   ,pg."File_Name"
		   ,('Bible.' || pg."File_Name") as "Bible_File_Name"
		   ,COALESCE(t."Text_Id", 78) as "Text_Id" --78 is the default record in the Texts_T table for general Bible data.
		   ,COALESCE(c."Chapter_Id", 2770) as "Chapter_Id" --2770 is the default record in the Chapters_T for general Bible data.
		   ,CURRENT_DATE as "Audit_Create_Date"
		   ,'Andy' as "Audit_Create_User"
	  FROM "AllRoads"."Bible_Pages_Stage_T" as pg 
		   LEFT OUTER JOIN
		   (SELECT "File_Name", "Group_Name", "Book_Name", "Chapter_Name" from "AllRoads"."Bible_Header_Stage_T" WHERE "Chapter_Name" IS NOT NULL) as h on (h."File_Name" = 'Bible.' || pg."File_Name")	   
		   LEFT OUTER JOIN -- take into consideration that the preface doesn't have a chapter Id
		   "AllRoads"."Chapters_T" as c on (c."URL" = pg."URL") 
		   LEFT OUTER JOIN
		   "AllRoads"."Texts_T" as t on (t."Text_Id" = c."Text_Id")
) pg1
;

References_T

INSERT INTO "AllRoads"."References_T"
(
	"Reference_Sequence_Id"
	,"Reference_Text"
	,"Page_Id"
	,"Text_Id"
	,"Audit_Create_Date"
	,"Audit_Create_User"
)
SELECT (case when r."Reference_Id"~ '^[0-9]+$' = TRUE THEN r."Reference_Id" ELSE '-1' END) as Reference_Sequence_Id
       ,(case when r."Reference_Id"~ '^[0-9]+$' = FALSE THEN r."Reference_Id" ELSE '' END) || COALESCE(r."Reference_Text", 'No Reference Text') as Reference_Text
	   ,pg."Page_Id"
	   ,pg."Text_Id"
	   ,CURRENT_DATE AS Audit_Create_Date
	   ,'Andy' as Audit_Create_User
  FROM "AllRoads"."Bible_References_Stage_T" as r
       INNER JOIN
	   "AllRoads"."Pages_T" as pg on (r."File_Name" = pg."File_Name")
ORDER BY cast(pg."Text_Id" as numeric)
		 ,cast(pg."Page_Id" as numeric)
         ,cast((case when r."Reference_Id"~ '^[0-9]+$' = TRUE THEN r."Reference_Id" ELSE NULL END) as numeric)

Chapters_T

INSERT INTO "AllRoads"."Chapters_T"
("URL"
,"Title"
,"Description"
,"Text_Id"
 ,"Audit_Create_Date"
 ,"Audit_Create_User"
)
SELECT pg."URL"
       ,h."Book_Name" || ' ' || pg."Title" as "Title"
	   ,h."Book_Name" || ' - ' || h."Chapter_Name" as "Description"
	   ,tt."Text_Id"
	   ,CURRENT_DATE as "Audit_Create_Date"
	   ,'Andy' as "Audit_Update_User"
  FROM "AllRoads"."Bible_Pages_Stage_T" as pg
       INNER JOIN
	   (SELECT "File_Name", "Group_Name", "Book_Name", "Chapter_Name" from "AllRoads"."Bible_Header_Stage_T" WHERE "Chapter_Name" IS NOT NULL) as h on (h."File_Name" = 'Bible.' || pg."File_Name")	   
	   INNER JOIN
	   "AllRoads"."Texts_T" as tt on (h."Group_Name" = tt."Group" and h."Book_Name" = tt."Title")
;

The All Roads Project – Ingesting the Bible

Share this: