API Documentation

Submodules

pychemparse.data module

class pychemparse.data.Data(data: dict | None = None, comment: str = '')

Bases: object

A dictionary-like class with an additional comment field. It is designed to return False if the dictionary is empty, even if a comment is present.

Parameters:
  • data (dict, optional) – The main data dictionary to store the parsed data. Defaults to an empty dictionary if None is provided. An error is raised if the input is not a dictionary.

  • comment (str, optional) – An optional comment about the data. Defaults to an empty string.

Variables:
  • data (dict) – The main data dictionary where parsed data is stored. This attribute is initialized with the data parameter.

  • comment (str) – A comment about the data. This attribute is initialized with the comment parameter.

clear() None

Remove all items from the dictionary.

copy() dict

Return a shallow copy of the dictionary.

Returns:

A shallow copy of the dictionary.

Return type:

dict

get(key: str, default: any | None = None) any

Safely retrieve an item by key, returning a default value if the key does not exist.

Parameters:
  • key (str) – The key of the item to retrieve.

  • default – The default value to return if the key does not exist. Defaults to None.

Returns:

The value associated with the key, or the default value.

Return type:

any

items() ItemsView[str, any]

Return a view object that displays a list of the dictionary’s key-value tuple pairs.

Returns:

A view object displaying the dictionary’s key-value pairs.

Return type:

ItemsView[str, any]

keys() KeysView[str]

Return a view object that displays a list of the dictionary’s keys.

Returns:

A view object displaying the dictionary’s keys.

Return type:

KeysView[str]

pop(key: str, default=None) any

Remove the specified key and return the corresponding value. If the key is not found, default is returned if provided, otherwise KeyError is raised.

Parameters:
  • key (str) – The key to remove and return its value.

  • default – The value to return if the key is not found. Defaults to None.

Returns:

The value for the key if the key is in the dictionary, else default.

Return type:

any

popitem() tuple[str, any]

Remove and return a (key, value) pair from the dictionary in LIFO order. Raises KeyError if the dictionary is empty.

Returns:

The removed (key, value) pair.

Return type:

tuple[str, any]

setdefault(key: str, default=None) any

Return the value of the key if it is in the dictionary, otherwise insert it with a default value.

Parameters:
  • key (str) – The key to check or insert in the dictionary.

  • default – The value to set if the key is not already in the dictionary. Defaults to None.

Returns:

The value for the key if the key is in the dictionary, else default.

Return type:

any

update(*args, **kwargs) None

Update the dictionary with the key/value pairs from other, overwriting existing keys.

Parameters:
  • args – A dictionary or an iterable of key/value pairs (as tuples or other iterables of length two).

  • kwargs – Additional key/value pairs to update the dictionary with.

values() ValuesView[any]

Return a view object that displays a list of the dictionary’s values.

Returns:

A view object of the dictionary’s values.

Return type:

ValuesView[any]

pychemparse.elements module

class pychemparse.elements.AvailableBlocksGeneral

Bases: object

Manages a registry of different types of block elements within a structured document.

This class provides a dynamic registry for block types, allowing for modular extension of block element capabilities. New block classes can be registered to the system using the class methods provided, enhancing the system’s modularity and extensibility.

Variables:

blocks (dict[str, type[Element]]) – A mapping of block names to their corresponding block class definitions.

blocks: dict[str, type[Element]] = {}
classmethod register_block(block_cls: type[Element]) type[Element]

Registers a new block type in the blocks registry.

This method acts as a decorator for registering block classes. It raises a ValueError if a block class with the same name is already registered, preventing unintentional overwrites.

Parameters:

block_cls (type[Element]) – The block class to be registered.

Returns:

The block class, facilitating use as a decorator.

Return type:

type[Element]

Raises:

ValueError – If a block with the same class name is already registered.

classmethod rewrite_block(block_cls: type[Element]) type[Element]

Registers or redefines a block type in the blocks registry.

Unlike register_block, this method allows the redefinition of existing block types by overwriting them if necessary. It is used when an update or replacement for an existing block definition is required.

Parameters:

block_cls (type[Element]) – The block class to be registered or redefined.

Returns:

The block class, enabling use as a decorator.

Return type:

type[Element]

class pychemparse.elements.Block(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Element

Represents a complex data block within a structured document.

Extends the Element class to encapsulate a more structured unit of data, potentially including identifiable components such as a name, header, and body. It provides methods to extract and present these components, with a default implementation for name extraction.

Variables:
  • data_available (bool) – Indicates whether the block contains extractable data. Defaults to False.

  • position (tuple | None) – The position of the block within the larger document structure, often expressed as a range of line numbers.

  • specified_class_name – A placeholder for the block’s subtype if it cannot be determined during processing. Defaults to None.

body() str

Retrieve the body content of the block.

Utilizes the extract_name_header_and_body method to extract the body of the block, which contains the main content.

Returns:

The body content of the block.

Return type:

str

static body_preformat(body_raw: str) str

Format the raw body content for HTML display.

This static method wraps the raw body text in HTML <pre> tags to enhance its presentation in HTML format.

Parameters:

body_raw (str) – The raw text of the body.

Returns:

The formatted body text, suitable for HTML display.

Return type:

str

data_available: bool = False
extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

header() str | None

Retrieve the block’s header, if it exists.

Uses the extract_name_header_and_body method to determine the presence and content of a header within the block.

Returns:

The block’s header if present, otherwise None.

Return type:

str | None

static header_preformat(header_raw: str) str

Format the raw header content for HTML display.

This static method wraps the raw header text in HTML <pre> tags to enhance its presentation in HTML format.

Parameters:

header_raw (str) – The raw text of the header.

Returns:

The formatted header text, suitable for HTML display.

Return type:

str

readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

specified_class_name: str | None = None
to_html() str

Generate an HTML representation of the block.

Constructs an HTML structure for the block, incorporating the name, header (if present), and body. The depth of the block within the document structure influences the header’s HTML level.

Returns:

A string containing the HTML representation of the block, with header and body sections formatted and wrapped in appropriate HTML tags.

Return type:

str

class pychemparse.elements.BlockUnknown(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

Represents a block of an unrecognized or unknown type within a structured document.

This class is used as a fallback for blocks that do not match any of the registered block types, allowing for generic handling of unknown or unstructured data.

data() Data

Warns about the unstructured nature of the block and returns its raw data encapsulated in a Data instance.

This method is called when attempting to process an unknown block type, issuing a warning about the lack of a structured extraction process and suggesting contributions for handling such blocks.

Returns:

A Data instance containing the block’s raw data and a comment about its unstructured nature.

Return type:

Data

class pychemparse.elements.Element(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: object

Represents a basic element within a structured document, serving as a fundamental unit of data.

An Element encapsulates raw data, positional information, and provides methods for data extraction and presentation. It acts as a base class for more specialized elements tailored to specific data types or structures within a document.

Parameters:
  • raw_data (str) – The raw text data associated with the element.

  • char_position (tuple[int, int] | None, optional) – The character position range (start, end) of the element within the larger data structure, if applicable.

  • line_position (tuple[int, int] | None, optional) – The line position range (start, end) of the element within the larger data structure, if applicable.

Variables:
  • raw_data (str) – The raw text data associated with this element.

  • char_position (tuple[int, int] | None) – The character position range of the element within the data structure, or None.

  • line_position (tuple[int, int] | None) – The line position range of the element within the data structure, or None.

data() Data

Process the raw data of the element to extract meaningful information.

This method is designed to be overridden by subclasses to implement specific data extraction logic tailored to the element’s structure and content.

Returns:

An instance of the Data class containing ‘raw data’ as its content, accompanied by a comment indicating the absence of specific data extraction procedures.

Return type:

Data

Raises:

Warning – Indicates that no specific procedure for analyzing the data was implemented.

static data_preformat(data_raw: str) str

Format the raw data for HTML display.

This static method wraps the raw data in HTML <pre> tags for better readability when displayed as HTML.

Parameters:

data_raw (str) – The raw text to be formatted.

Returns:

The formatted text wrapped in HTML <pre> tags.

Return type:

str

depth() int

Calculate the depth of nested structures within the element.

This method computes the maximum depth of nested lists representing the hierarchical structure of the element, indicating the complexity of its structure.

Returns:

The maximum depth of the element’s nested list structure.

Return type:

int

get_structure() dict[Self, tuple | None]

Retrieve the structural representation of the element as a nested dictionary.

This method provides a way to represent the hierarchical relationships within data, where each element can contain nested sub-elements.

Returns:

A dictionary with the element itself as the key and an empty tuple as the value, indicating no nested structure by default.

Return type:

dict[Self, tuple | None]

static max_depth(d) int

Compute the maximum depth of a nested list structure.

This utility method assists in determining the complexity of an element’s structural hierarchy by calculating the depth of nested lists.

Parameters:

d (list | dict) – A nested list or dictionary representing the structure of an element or a complex data structure.

Returns:

The maximum depth of the nested list or dictionary structure.

Return type:

int

static process_invalid_name(input_string: str) str

Clean and process an input string to generate a valid name or identifier.

This method sanitizes input strings that may contain invalid characters or formatting, ensuring the output is suitable for use as a name or identifier. It handles strings without letters by labeling them as “Unknown” and removes non-alphabetic characters from other strings.

Parameters:

input_string (str) – The input string to be processed.

Returns:

A cleaned and possibly truncated version of the input string, made suitable for use as a name or identifier.

Return type:

str

readable_name() None

Generate a readable name for the element based on its data.

This method is intended to be overridden by subclasses to provide a meaningful, human-readable name derived from the element’s content.

Returns:

None by default, indicating the method has not been implemented. Subclasses should override this method.

Return type:

None

to_html() str

Generate an HTML representation of the element.

This method provides a basic HTML structure for displaying the element’s data. Subclasses may override this method to provide more specialized HTML representations tailored to the element’s specific characteristics.

Returns:

A string containing the HTML representation of the element, incorporating the preformatted raw data.

Return type:

str

exception pychemparse.elements.ExtractionError

Bases: Exception

Custom exception class for errors encountered during energy extraction processes.

This exception is raised when there is a problem with extracting energy-related data from a given source or dataset.

class pychemparse.elements.Spacer(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Element

data() None

Indicate that no data is associated with a Spacer element.

Overrides the data method from the Element class to return None, reflecting the intended use of a Spacer as a representation of empty space or a separator without meaningful data.

Returns:

None, indicating the absence of data.

Return type:

None

static data_preformat(data_raw: str) str

Format raw Spacer content for HTML display by replacing newlines with HTML line breaks.

Parameters:

data_raw (str) – The raw text content of the spacer.

Returns:

The formatted content with newlines converted to HTML line breaks.

Return type:

str

pychemparse.file module

class pychemparse.file.File(file_path: str, regex_settings: RegexSettings | None = None, mode: str | None = 'ORCA')

Bases: object

Manages the processing of a file within the ChemParser framework.

This class is responsible for parsing a given file, identifying and extracting elements based on predefined regex patterns, and facilitating the generation of an HTML representation of the file’s content.

Variables:
  • file_path (str) – The path to the input file being processed.

  • regex_settings (RegexSettings) – The regex settings utilized for pattern processing within the file.

  • initialized (bool) – A flag indicating whether the instance has been properly initialized with file content and regex settings.

  • original_text (str) – The original textual content read from the file.

  • _blocks (pd.DataFrame) – A DataFrame storing the processed elements identified within the file.

  • _marked_text (list[tuple[tuple[int, int], tuple[int, int], str | Element]]) – A list of marked text segments, each containing character and line positions alongside the corresponding text or Element object.

  • mode (str) – The operational mode of the file, which may affect regex settings and processing behavior. Common modes include ‘ORCA’, ‘GPAW’ and ‘VASP’.

Parameters:
  • file_path (str) – The path to the file to be processed.

  • regex_settings (Optional[RegexSettings], optional) – Custom regex settings for pattern processing. If not provided, default settings based on the specified mode will be used.

  • mode (Optional[str], optional) – The processing mode, influencing default regex settings and behavior. Supported modes include ‘ORCA’, ‘GPAW’ and ‘VASP’.

Raises:

ValueError – If an invalid mode is specified.

create_html(css_content: str | None = None, js_content: str | None = None, insert_css: bool | None = True, insert_js: bool | None = True, insert_left_sidebar: bool | None = True, insert_colorcomment_sidebar: bool | None = True, show_progress: bool | None = False) str

Constructs a complete HTML document from processed text, integrating optional CSS and JavaScript content.

Parameters:
  • css_content (Optional[str], optional) – Custom CSS to be included in the HTML document. Defaults to predefined CSS if not provided.

  • js_content (Optional[str], optional) – Custom JavaScript to be included in the HTML document. Defaults to predefined JavaScript if not provided.

  • insert_css (Optional[bool], optional) – Determines whether to include CSS content in the HTML document.

  • insert_js (Optional[bool], optional) – Determines whether to include JavaScript content in the HTML document.

  • insert_left_sidebar (Optional[bool], optional) – Specifies whether to include a left sidebar for the Table of Contents (TOC) in the HTML document.

  • insert_colorcomment_sidebar (Optional[bool], optional) – Specifies whether to include a comment sidebar for additional annotations in the HTML document.

  • show_progress (Optional[bool], optional) – Specifies whether to display a progress bar during operation.

Returns:

The complete HTML document as a string.

Return type:

str

depth() int

Calculates the maximum depth of nested structures within the File instance.

Returns:

The maximum depth of nested elements’ structures.

Return type:

int

static extract_data_errors_to_none(orca_element: Element) Data | None

Tries to extract data from an Element, handling errors by returning None.

This method encapsulates error handling during data extraction from an Element. If an error occurs, the issue is logged, and None is returned.

Parameters:

orca_element (Element) – An Element instance from which data is to be extracted.

Returns:

The extracted data in Data format from the Element, or None if an error occurred.

Return type:

Data | None

static extract_raw_data_errors_to_none(orca_element: Element) str | None

Tries to extract raw data from an Element, returning None in case of errors.

This method is designed to handle errors gracefully during the extraction of raw data from an Element. If an error occurs, a warning is issued and None is returned.

Parameters:

orca_element (Element) – An instance of Element from which raw data is to be extracted.

Returns:

The extracted raw data from the Element, or None if an error occurred.

Return type:

str | None

get_blocks(show_progress: bool | None = False) DataFrame

Retrieves all processed blocks as a DataFrame, ensuring the file has been initialized.

Parameters:

show_progress (Optional[bool], optional) – Optionally displays a progress bar during initialization.

Returns:

A DataFrame containing processed blocks with their metadata.

Return type:

pd.DataFrame

get_data(extract_only_raw: bool | None = False, element_type: type[Element] | None = None, readable_name: str | None = None, raw_data_substring: str | Iterable[str] | None = None, raw_data_not_substring: str | Iterable[str] | None = None, show_progress: bool | None = False) DataFrame

Retrieves and extracts data from Element instances based on search criteria, with an option to extract raw or processed data.

Parameters:
  • extract_only_raw (Optional[bool], optional) – If True, only raw data will be extracted, bypassing any custom data extraction logic defined in Element subclasses.

  • element_type (Optional[type[Element]], optional) – The type of Element to filter by; only elements of this type will be considered.

  • readable_name (Optional[str], optional) – A filter for elements that have this exact readable_name.

  • raw_data_substring (Optional[str | Iterable[str]], optional) – A filter for elements whose raw_data contains this substring.

  • raw_data_not_substring (Optional[str | Iterable[str]], optional) – A filter for elements whose raw_data does not contain this substring.

  • show_progress (Optional[bool], optional) – If True, displays a progress bar during the operation.

Returns:

A DataFrame of the filtered elements with their extracted data.

Return type:

pd.DataFrame

get_marked_text(show_progress: bool | None = False) list[tuple[tuple[int, int], tuple[int, int], str | Element]]

Retrieves the text segments with associated markers after processing patterns, ensuring the file has been initialized.

Parameters:

show_progress (Optional[bool], optional) – Optionally displays a progress bar during initialization.

Returns:

A list of text segments marked with their character and line positions, alongside the corresponding text or Element object.

Return type:

list[tuple[tuple[int, int], tuple[int, int], str | Element]]

get_structure() dict[Self, list]

Retrieves the hierarchical structure of the File instance, representing the organization of processed elements.

Returns:

A dictionary mapping the File instance to a list of its elements’ structures.

Return type:

dict[Self, list]

initialize(show_progress: bool | None = False) None

Initializes the File instance by processing patterns, if not already done, to identify and categorize text segments.

Parameters:

show_progress (Optional[bool], optional) – Optionally displays a progress bar during the pattern processing phase.

process_patterns(show_progress: bool | None = False) None

Identifies and categorizes text segments based on predefined regex patterns, updating the internal storage of blocks and marked text.

Parameters:

show_progress (Optional[bool], optional) – Optionally displays a progress bar during the processing of regex patterns.

save_as_html(output_file_path: str, insert_css: bool | None = True, insert_js: bool | None = True, insert_left_sidebar: bool | None = True, insert_colorcomment_sidebar: bool | None = True, show_progress: bool | None = False)

Generates and saves an HTML document based on the processed content of the File instance, with customizable display options.

This method leverages create_html to construct the HTML content, including optional CSS and JavaScript, as well as sidebars for navigation and comments. The complete HTML is then saved to the specified file path.

Parameters:
  • output_file_path (str) – The file path, including the name and extension, where the HTML document will be saved. Existing files will be overwritten.

  • insert_css (Optional[bool], optional) – If True, includes CSS content in the HTML document for styling. Defaults to True.

  • insert_js (Optional[bool], optional) – If True, includes JavaScript content in the HTML document for interactivity. Defaults to True.

  • insert_left_sidebar (Optional[bool], optional) – If True, includes a left sidebar in the HTML document, typically used for a Table of Contents (TOC). Defaults to True.

  • insert_colorcomment_sidebar (Optional[bool], optional) – If True, includes a sidebar for additional annotations or comments in the HTML document. Defaults to True.

  • show_progress (Optional[bool], optional) – If True, displays a progress indicator during the HTML content generation process. Defaults to False.

Note:

This method allows exporting the processed content to an HTML format, facilitating viewing in web browsers or further processing with HTML-compatible tools. The inclusion of CSS and JavaScript enhances the document’s appearance and interactivity, while optional sidebars provide navigation and annotation capabilities.

search_elements(element_type: type[Element] | None = None, readable_name: str | None = None, raw_data_substring: str | Iterable[str] | None = None, raw_data_not_substring: str | Iterable[str] | None = None, show_progress: bool = False) DataFrame

Searches for Element instances based on specified criteria, such as element type, readable name, and raw data content.

Parameters:
  • element_type (type[Element] | None, optional) – The class type of Element to search for, if filtering by type.

  • readable_name (str | None, optional) – The exact term to search for in the readable_name attribute of Element.

  • raw_data_substring (str | Iterable[str] | None, optional) – The substring(s) to search for within the raw_data attribute of Element.

  • raw_data_not_substring (str | Iterable[str] | None, optional) – The substring(s) whose absence within the raw_data attribute is required.

  • show_progress (bool, optional) – Whether to display a progress bar during initialization.

Returns:

A DataFrame containing filtered Element instances based on the provided criteria.

Return type:

pd.DataFrame

pychemparse.orca_elements module

class pychemparse.orca_elements.AvailableBlocksOrca

Bases: AvailableBlocksGeneral

A class to store all available blocks for ORCA.

blocks: dict[str, type[Element]] = {'BlockOrcaAbsorptionSpectrumViaTransitionElectricDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionElectricDipoleMoments'>, 'BlockOrcaAbsorptionSpectrumViaTransitionVelocityDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionVelocityDipoleMoments'>, 'BlockOrcaAcknowledgement': <class 'pychemparse.orca_elements.BlockOrcaAcknowledgement'>, 'BlockOrcaAllRightsReserved': <class 'pychemparse.orca_elements.BlockOrcaAllRightsReserved'>, 'BlockOrcaAuxJBasis': <class 'pychemparse.orca_elements.BlockOrcaAuxJBasis'>, 'BlockOrcaCdSpectrum': <class 'pychemparse.orca_elements.BlockOrcaCdSpectrum'>, 'BlockOrcaCdSpectrumViaTransitionElectricDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionElectricDipoleMoments'>, 'BlockOrcaCdSpectrumViaTransitionVelocityDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionVelocityDipoleMoments'>, 'BlockOrcaCiNebConvergence': <class 'pychemparse.orca_elements.BlockOrcaCiNebConvergence'>, 'BlockOrcaContributions': <class 'pychemparse.orca_elements.BlockOrcaContributions'>, 'BlockOrcaDipoleMoment': <class 'pychemparse.orca_elements.BlockOrcaDipoleMoment'>, 'BlockOrcaErrorMessage': <class 'pychemparse.orca_elements.BlockOrcaErrorMessage'>, 'BlockOrcaFinalSinglePointEnergy': <class 'pychemparse.orca_elements.BlockOrcaFinalSinglePointEnergy'>, 'BlockOrcaGeometryConvergence': <class 'pychemparse.orca_elements.BlockOrcaGeometryConvergence'>, 'BlockOrcaHurrayCI': <class 'pychemparse.orca_elements.BlockOrcaHurrayCI'>, 'BlockOrcaHurrayOptimization': <class 'pychemparse.orca_elements.BlockOrcaHurrayOptimization'>, 'BlockOrcaHurrayTS': <class 'pychemparse.orca_elements.BlockOrcaHurrayTS'>, 'BlockOrcaIcon': <class 'pychemparse.orca_elements.BlockOrcaIcon'>, 'BlockOrcaInputFile': <class 'pychemparse.orca_elements.BlockOrcaInputFile'>, 'BlockOrcaLibXc': <class 'pychemparse.orca_elements.BlockOrcaLibXc'>, 'BlockOrcaLibint2': <class 'pychemparse.orca_elements.BlockOrcaLibint2'>, 'BlockOrcaOrbitalBasis': <class 'pychemparse.orca_elements.BlockOrcaOrbitalBasis'>, 'BlockOrcaOrbitalEnergies': <class 'pychemparse.orca_elements.BlockOrcaOrbitalEnergies'>, 'BlockOrcaPathSummaryForNebCi': <class 'pychemparse.orca_elements.BlockOrcaPathSummaryForNebCi'>, 'BlockOrcaPathSummaryForNebTs': <class 'pychemparse.orca_elements.BlockOrcaPathSummaryForNebTs'>, 'BlockOrcaRotationalSpectrum': <class 'pychemparse.orca_elements.BlockOrcaRotationalSpectrum'>, 'BlockOrcaScf': <class 'pychemparse.orca_elements.BlockOrcaScf'>, 'BlockOrcaScfConverged': <class 'pychemparse.orca_elements.BlockOrcaScfConverged'>, 'BlockOrcaScfType': <class 'pychemparse.orca_elements.BlockOrcaScfType'>, 'BlockOrcaShark': <class 'pychemparse.orca_elements.BlockOrcaShark'>, 'BlockOrcaSoscf': <class 'pychemparse.orca_elements.BlockOrcaSoscf'>, 'BlockOrcaSpectrumType': <class 'pychemparse.orca_elements.BlockOrcaSpectrumType'>, 'BlockOrcaTddftExcitedStatesSinglets': <class 'pychemparse.orca_elements.BlockOrcaTddftExcitedStatesSinglets'>, 'BlockOrcaTddftTdaExcitedStates': <class 'pychemparse.orca_elements.BlockOrcaTddftTdaExcitedStates'>, 'BlockOrcaTerminatedNormally': <class 'pychemparse.orca_elements.BlockOrcaTerminatedNormally'>, 'BlockOrcaTimingsForIndividualModules': <class 'pychemparse.orca_elements.BlockOrcaTimingsForIndividualModules'>, 'BlockOrcaTotalRunTime': <class 'pychemparse.orca_elements.BlockOrcaTotalRunTime'>, 'BlockOrcaTotalScfEnergy': <class 'pychemparse.orca_elements.BlockOrcaTotalScfEnergy'>, 'BlockOrcaUnrecognizedHurray': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedHurray'>, 'BlockOrcaUnrecognizedMessage': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedMessage'>, 'BlockOrcaUnrecognizedNotification': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedNotification'>, 'BlockOrcaUnrecognizedScf': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedScf'>, 'BlockOrcaUnrecognizedWithHeader': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedWithHeader'>, 'BlockOrcaUnrecognizedWithSingeLineHeader': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedWithSingeLineHeader'>, 'BlockOrcaUses': <class 'pychemparse.orca_elements.BlockOrcaUses'>, 'BlockOrcaVersion': <class 'pychemparse.orca_elements.BlockOrcaVersion'>, 'BlockOrcaVibrationalFrequencies': <class 'pychemparse.orca_elements.BlockOrcaVibrationalFrequencies'>, 'BlockOrcaWarnings': <class 'pychemparse.orca_elements.BlockOrcaWarnings'>, 'BlockOrcaWithStandardHeader': <class 'pychemparse.orca_elements.BlockOrcaWithStandardHeader'>}
class pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionElectricDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaSpectrumType

Parses the ‘ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS’ block.

class pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionVelocityDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaSpectrumType

Parses the ‘ABSORPTION SPECTRUM VIA TRANSITION VELOCITY DIPOLE MOMENTS’ block.

class pychemparse.orca_elements.BlockOrcaAcknowledgement(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaAllRightsReserved(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores All rights reserved message from ORCA output files.

Example of ORCA Output:

#######################################################
#                        -***-                        #
#          Department of theory and spectroscopy      #
#    Directorship and core code : Frank Neese         #
#        Max Planck Institute fuer Kohlenforschung    #
#                Kaiser Wilhelm Platz 1               #
#                 D-45470 Muelheim/Ruhr               #
#                      Germany                        #
#                                                     #
#                  All rights reserved                #
#                        -***-                        #
#######################################################
extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaAuxJBasis(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaCdSpectrum(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaSpectrumType

Parses the ‘CD SPECTRUM’ block.

class pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionElectricDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaSpectrumType

Parses the ‘CD SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS’ block.

class pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionVelocityDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaSpectrumType

Parses the ‘CD SPECTRUM VIA TRANSITION VELOCITY DIPOLE MOMENTS’ block.

class pychemparse.orca_elements.BlockOrcaCiNebConvergence(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores CI-NEB convergence data from ORCA output files.

Example of ORCA Output:

                      .--------------------.
----------------------| CI-Neb convergence |-------------------------
Item                value                   Tolerance       Converged
---------------------------------------------------------------------
RMS(Fp)             0.0002797716            0.0100000000      YES
MAX(|Fp|)           0.0014572463            0.0200000000      YES
RMS(FCI)            0.0001842330            0.0010000000      YES
MAX(|FCI|)          0.0005858110            0.0020000000      YES
---------------------------------------------------------------------
data() Data

Returns a pychemparse.data.Data object containing:

  • pandas.DataFrame Data with columns Item, Value, Tolerance, Converged.

  • str Comment.

Parsed data example:

        'Data':          Item     Value  Tolerance Converged
0     RMS(Fp)  0.000188      0.010       YES
1   MAX(|Fp|)  0.000727      0.020       YES
2    RMS(FCI)  0.000212      0.001       YES
3  MAX(|FCI|)  0.000644      0.002       YES
Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

class pychemparse.orca_elements.BlockOrcaContributions(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaDipoleMoment(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

The block captures and stores Dipole moment from ORCA output files.

Example of ORCA Output:

-------------
DIPOLE MOMENT
-------------
                                X             Y             Z
Electronic contribution:      0.00000       0.00000       4.52836
Nuclear contribution   :      0.00000       0.00000      -8.26530
                        -----------------------------------------
Total Dipole Moment    :      0.00000       0.00000      -3.73694
                        -----------------------------------------
Magnitude (a.u.)       :      3.73694
Magnitude (Debye)      :      9.49854

or

-------------
DIPOLE MOMENT
-------------

Method             : SCF
Type of density    : Electron Density
Multiplicity       :   1
Irrep              :   0
Energy             :  -379.2946629874107884 Eh
Relativity type    :
Basis              : AO
                                X                 Y                 Z
Electronic contribution:     -0.000041430       0.000000017       4.661630904
Nuclear contribution   :      0.000000009       0.000000000      -8.265300471
                        -----------------------------------------
Total Dipole Moment    :     -0.000041422       0.000000017      -3.603669567
                        -----------------------------------------
Magnitude (a.u.)       :      3.603669567
Magnitude (Debye)      :      9.159800098
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pint.Quantity’s with numpy.ndarray’s of contributions

  • pint.Quantity Total Dipole Moment with numpy.ndarray’s of contributions in a.u.

  • pint.Quantity Magnitude (a.u.) – total dipole moment. The magnitude in a.u. can be extracted from pint.Quantity with .magnitude property.

  • pint.Quantity Magnitude (Debye) – total dipole moment. The magnitude in Debye can be extracted from pint.Quantity with .magnitude property.

Parsed data example:

{
'Electronic contribution': <Quantity([0.      0.      5.37241], 'bohr * elementary_charge')>,
'Nuclear contribution': <Quantity([ 0.      0.     -8.2653], 'bohr * elementary_charge')>,
'Total Dipole Moment': <Quantity([ 0.       0.      -2.89289], 'bohr * elementary_charge')>,
'Magnitude (a.u.)': <Quantity(2.89289, 'bohr * elementary_charge')>,
'Magnitude (Debye)': <Quantity(7.35314, 'debye')>
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaErrorMessage(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores ORCA error message from ORCA output files.

Example of ORCA Output:

----------------------------------------------------------------------------
                            ERROR !!!
    The TS optimization did not converge but reached the maximum
    number of optimization cycles.
    As a subsequent Frequencies calculation has been requested
    ORCA will abort at this point of the run.
----------------------------------------------------------------------------
data() Data
Returns:

pychemparse.data.Data object that contains:

  • str for the Error message if present

Parsed data example:

{'Error': 'ERROR !!!
The optimization did not converge but reached the maximum
number of optimization cycles.
As a subsequent Frequencies calculation has been requested
ORCA will abort at this point of the run.
Please restart the calculation with the lowest energy geometry and/or
a larger maxiter for the geometry optimization.'}
Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaFinalSinglePointEnergy(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Final single point energy from ORCA output files.

Example of ORCA Output:

-------------------------   --------------------
FINAL SINGLE POINT ENERGY      -379.259324337759
-------------------------   --------------------
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pint.Quantity Energy

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaGeometryConvergence(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Total run time from ORCA output files.

Example of ORCA Output:

                      .--------------------.
----------------------|Geometry convergence|-------------------------
Item                value                   Tolerance       Converged
---------------------------------------------------------------------
Energy change       0.0000035570            0.0000050000      YES
RMS gradient        0.0000436223            0.0001000000      YES
MAX gradient        0.0002094156            0.0003000000      YES
RMS step            0.0022222022            0.0020000000      NO
MAX step            0.0170204003            0.0040000000      NO
........................................................
Max(Bonds)      0.0003      Max(Angles)    0.02
Max(Dihed)        0.98      Max(Improp)    0.00
---------------------------------------------------------------------
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pandas.DataFrame Geometry convergence data

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaHurray(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

class pychemparse.orca_elements.BlockOrcaHurrayCI(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaHurray

class pychemparse.orca_elements.BlockOrcaHurrayOptimization(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaHurray

class pychemparse.orca_elements.BlockOrcaHurrayTS(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaHurray

class pychemparse.orca_elements.BlockOrcaIcon(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores All rights reserved message from ORCA output files.

Example of ORCA Output:

                                       #,
                                       ###
                                       ####
                                       #####
                                       ######
                                      ########,
                                ,,################,,,,,
                          ,,#################################,,
                     ,,##########################################,,
                  ,#########################################, ''#####,
               ,#############################################,,   '####,
             ,##################################################,,,,####,
           ,###########''''           ''''###############################
         ,#####''   ,,,,##########,,,,          '''####'''          '####
       ,##' ,,,,###########################,,,                        '##
      ' ,,###''''                  '''############,,,
    ,,##''                                '''############,,,,        ,,,,,,###''
 ,#''                                            '''#######################'''
'                                                          ''''####''''
        ,#######,   #######,   ,#######,      ##
       ,#'     '#,  ##    ##  ,#'     '#,    #''#        ######   ,####,
       ##       ##  ##   ,#'  ##            #'  '#       #        #'  '#
       ##       ##  #######   ##           ,######,      #####,   #    #
       '#,     ,#'  ##    ##  '#,     ,#' ,#      #,         ##   #,  ,#
        '#######'   ##     ##  '#######'  #'      '#     #####' # '####'
data() Data

Icon is icon, noting to extract except for the ascii symbols

data_available: bool = True
extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

class pychemparse.orca_elements.BlockOrcaInputFile(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaLibXc(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaLibint2(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaOrbitalBasis(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaOrbitalEnergies(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

The block captures and stores orbital energies and occupation numbers from ORCA output files.

Example of ORCA Output:

----------------
ORBITAL ENERGIES
----------------
NO   OCC          E(Eh)            E(eV)
0   2.0000     -14.038014      -381.9938
1   2.0000     -13.986101      -380.5812
2   2.0000      -0.200360        -5.4521
3   0.0000      -0.065149        -1.7728
4   0.0000      -0.060749        -1.6531
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pandas.DataFrame Orbitals

    that includes the columns NO, OCC, E(Eh), and E(eV). The E(Eh) and E(eV) columns represent the same energy values in different units (Hartree and electronvolts, respectively). These values are extracted from the output file and should match unless there’s an error in the ORCA output.

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaPathSummaryForNebCi(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaPathSummaryForNebTs

The block captures and stores NEB-TS path summary data from ORCA output files.

Example of ORCA Output:

---------------------------------------------------------------
                        PATH SUMMARY
---------------------------------------------------------------
All forces in Eh/Bohr.

Image Dist.(Ang.)    E(Eh)   dE(kcal/mol)  max(|Fp|)  RMS(Fp)
0     0.000   -1040.28151      0.00       0.00024   0.00008
1     4.329   -1040.26830      8.29       0.00103   0.00025
2     6.607   -1040.25791     14.81       0.00120   0.00029
3     8.283   -1040.25022     19.64       0.00174   0.00042
4     9.599   -1040.24240     24.54       0.00116   0.00026
5    10.780   -1040.23790     27.37       0.00047   0.00015 <= CI
6    12.215   -1040.24200     24.80       0.00098   0.00026
7    13.815   -1040.25258     18.16       0.00076   0.00021
8    16.040   -1040.26419     10.87       0.00043   0.00013
9    19.933   -1040.27575      3.62       0.00012   0.00004

Straight line distance between images along the path:
        D( 0- 1) =   4.3288 Ang.
        D( 1- 2) =   2.2782 Ang.
        D( 2- 3) =   1.6757 Ang.
        D( 3- 4) =   1.3168 Ang.
        D( 4- 5) =   1.1801 Ang.
        D( 5- 6) =   1.4358 Ang.
        D( 6- 7) =   1.5995 Ang.
        D( 7- 8) =   2.2254 Ang.
        D( 8- 9) =   3.8933 Ang.
class pychemparse.orca_elements.BlockOrcaPathSummaryForNebTs(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

The block captures and stores NEB-TS path summary data from ORCA output files.

Example of ORCA Output:

---------------------------------------------------------------
              PATH SUMMARY FOR NEB-TS
---------------------------------------------------------------
All forces in Eh/Bohr. Global forces for TS.

Image     E(Eh)   dE(kcal/mol)  max(|Fp|)  RMS(Fp)
0   -1040.28151     0.00       0.00024   0.00008
1   -1040.26641     9.48       0.00357   0.00076
2   -1040.25443    17.00       0.00387   0.00111
3   -1040.24519    22.79       0.00279   0.00095
4   -1040.23692    27.98       0.00459   0.00133
5   -1040.23342    30.18       0.00189   0.00067 <= CI
TS   -1040.23850    26.99       0.00022   0.00005 <= TS
6   -1040.23665    28.15       0.00216   0.00079
7   -1040.24833    20.82       0.00200   0.00076
8   -1040.26217    12.14       0.00200   0.00058
9   -1040.27575     3.62       0.00012   0.00004
data() Data
Returns:

pychemparse.data.Data object that contains: - (int) states as keys, and their respective details as sub-dictionaries. The Energy (eV) values are stored as pint.Quantity. The Transitions are stored in a list, with each transition represented as a dict containing the From Orbital (str: number+a|b), To Orbital (str: number+a|b), and Coefficient (float).

Parsed data example:

{'Data':    Image                      E(Eh)              dE(kcal/mol)                      0      0  -1040.28151 electron_volt    0.0 kilocalorie / mole
    1      1  -1040.27082 electron_volt   6.71 kilocalorie / mole
    2      2   -1040.2608 electron_volt   13.0 kilocalorie / mole
    3      3   -1040.2518 electron_volt  18.64 kilocalorie / mole
    4      4  -1040.24453 electron_volt  23.21 kilocalorie / mole
    5      5  -1040.24169 electron_volt  24.99 kilocalorie / mole
    6     TS  -1040.24272 electron_volt  24.34 kilocalorie / mole
    7      6  -1040.24575 electron_volt  22.44 kilocalorie / mole
    8      7  -1040.25472 electron_volt  16.81 kilocalorie / mole
    9      8  -1040.26597 electron_volt   9.75 kilocalorie / mole
    10     9  -1040.27575 electron_volt   3.62 kilocalorie / mole

                    max(|Fp|)                 RMS(Fp) Comment
    0   0.00023 hartree / bohr    7e-05 hartree / bohr
    1   0.00068 hartree / bohr  0.00023 hartree / bohr
    2   0.00072 hartree / bohr  0.00023 hartree / bohr
    3   0.00073 hartree / bohr  0.00022 hartree / bohr
    4   0.00067 hartree / bohr   0.0002 hartree / bohr
    5   0.00063 hartree / bohr  0.00021 hartree / bohr   <= CI
    6     7e-05 hartree / bohr    2e-05 hartree / bohr   <= TS
    7   0.00058 hartree / bohr  0.00021 hartree / bohr
    8   0.00055 hartree / bohr  0.00019 hartree / bohr
    9   0.00065 hartree / bohr  0.00019 hartree / bohr
    10  0.00018 hartree / bohr    5e-05 hartree / bohr
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaRotationalSpectrum(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaScf(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaScfType

The block captures and stores SCF data from ORCA output files.

Example of ORCA Output:

-------------------------------------S-C-F---------------------------------------
Iteration    Energy (Eh)           Delta-E    RMSDP     MaxDP     Damp  Time(sec)
---------------------------------------------------------------------------------
            ***  Starting incremental Fock matrix formation  ***
                            *** Initializing SOSCF ***
                            *** Constraining orbitals ***
                            *** Switching to L-BFGS ***
Constrained orbitals (energetic order)
30 31
Constrained orbitals (compact order)
31 30
data() Data

Returns a pychemparse.data.Data object containing:

  • pandas.DataFrame Data with columns Iteration, Energy (Eh), Delta-E, RMSDP, MaxDP, Damp, Time(sec):
    • Time(sec) is represented as a timedelta object.

    • Energy (Eh) is represented by a pint object. Magnitude can be extracted with the .magnitude method.

  • pandas.DataFrame Comments with columns Iteration and Comment.

  • str Name of the block.

Parsed data example:

{'Data': Empty DataFrame
Columns: [Iteration, Energy (
    Eh), Delta-E, RMSDP, MaxDP, Damp, Time(sec)]
Index: [],
'Comments':    Iteration                                            Comment
0          0  ***  Starting incremental Fock matrix formatio...
1          0                         *** Initializing SOSCF ***
2          0                      *** Constraining orbitals ***
3          0                        *** Switching to L-BFGS ***,
'Name': 'S-C-F'}
Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaScfConverged(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores SCF convergence message from ORCA output files.

Example of ORCA Output:

*****************************************************
*                     SUCCESS                       *
*           SCF CONVERGED AFTER  20 CYCLES          *
*****************************************************
data() Data
Returns:

pychemparse.data.Data object that contains:

  • bool for Success of the extraction

  • int for amount of Cycles

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaScfType(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores SCF data from ORCA output files.

Example of ORCA Output:

-------------------------------------S-C-F---------------------------------------
Iteration    Energy (Eh)           Delta-E    RMSDP     MaxDP     Damp  Time(sec)
---------------------------------------------------------------------------------
            ***  Starting incremental Fock matrix formation  ***
                            *** Initializing SOSCF ***
                            *** Constraining orbitals ***
                            *** Switching to L-BFGS ***
Constrained orbitals (energetic order)
30 31
Constrained orbitals (compact order)
31 30

or

---------------------------------------S-O-S-C-F--------------------------------------
Iteration    Energy (Eh)            Delta-E     RMSDP    MaxDP     MaxGrad    Time(sec)
--------------------------------------------------------------------------------------
    1    -379.2796837014277571     0.00e+00  0.00e+00  0.00e+00  3.00e-02   0.3
            *** Restarting incremental Fock matrix formation ***
    2    -379.2796837014277571     0.00e+00  5.39e-03  2.36e-01  3.00e-02   0.3
    3    -379.2788786204820326     8.05e-04  2.96e-03  1.30e-01  3.68e-02   0.3
    4    -379.2897810987828962    -1.09e-02  1.69e-03  1.37e-01  9.46e-03   0.2
    5    -379.2878642728886689     1.92e-03  8.10e-04  7.42e-02  1.68e-02   0.2
    6    -379.2909711775516826    -3.11e-03  7.04e-04  3.11e-02  4.12e-03   0.2
                    ***Gradient convergence achieved***
                        *** Unconstraining orbitals ***
        *** Restarting Hessian update and switching to L-SR1 ***
    7    -379.2904844538218185     4.87e-04  1.27e-03  4.23e-02  1.72e-02   0.2
    8    -379.2892451088814596     1.24e-03  9.18e-04  7.40e-02  2.70e-02   0.3
    9    -379.2943354063930883    -5.09e-03  3.93e-04  1.30e-02  2.96e-03   0.2
    10    -379.2945957143243731    -2.60e-04  2.02e-04  7.28e-03  1.37e-03   0.2
    11    -379.2946565737383935    -6.09e-05  7.77e-04  4.26e-02  4.79e-04   0.2
    12    -379.2946442625134296     1.23e-05  3.75e-03  2.20e-01  9.99e-04   0.2
    13    -379.2946572622200847    -1.30e-05  8.03e-04  4.98e-02  7.21e-04   0.2
    14    -379.2946626618473829    -5.40e-06  4.17e-04  2.11e-02  1.05e-04   0.2
    15    -379.2946629954453783    -3.34e-07  1.04e-03  5.09e-02  4.80e-05   0.2
                        ***Gradient convergence achieved***
extract_name_header_and_body() tuple[str, str | None, str]

Identifies and separates the name, header, and body of the block based on a SCF header format.

Utilizes regular expressions to discern the header portion from the body, processing the header to extract a distinct name and the header content. The text following the header is treated as the body of the block.

Returns

tuple[str, str | None, str]

The name of the block, the header content (or None if a header is not present), and the body of the block.

class pychemparse.orca_elements.BlockOrcaShark(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaSoscf(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaScf

The block captures and stores SOSCF data from ORCA output files.

Example of ORCA Output:

---------------------------------------S-O-S-C-F--------------------------------------
Iteration    Energy (Eh)            Delta-E     RMSDP    MaxDP     MaxGrad    Time(sec)
--------------------------------------------------------------------------------------
    1    -379.2796837014277571     0.00e+00  0.00e+00  0.00e+00  3.00e-02   0.3
            *** Restarting incremental Fock matrix formation ***
    2    -379.2796837014277571     0.00e+00  5.39e-03  2.36e-01  3.00e-02   0.3
    3    -379.2788786204820326     8.05e-04  2.96e-03  1.30e-01  3.68e-02   0.3
    4    -379.2897810987828962    -1.09e-02  1.69e-03  1.37e-01  9.46e-03   0.2
    5    -379.2878642728886689     1.92e-03  8.10e-04  7.42e-02  1.68e-02   0.2
    6    -379.2909711775516826    -3.11e-03  7.04e-04  3.11e-02  4.12e-03   0.2
                    ***Gradient convergence achieved***
                        *** Unconstraining orbitals ***
        *** Restarting Hessian update and switching to L-SR1 ***
    7    -379.2904844538218185     4.87e-04  1.27e-03  4.23e-02  1.72e-02   0.2
    8    -379.2892451088814596     1.24e-03  9.18e-04  7.40e-02  2.70e-02   0.3
    9    -379.2943354063930883    -5.09e-03  3.93e-04  1.30e-02  2.96e-03   0.2
    10    -379.2945957143243731    -2.60e-04  2.02e-04  7.28e-03  1.37e-03   0.2
    11    -379.2946565737383935    -6.09e-05  7.77e-04  4.26e-02  4.79e-04   0.2
    12    -379.2946442625134296     1.23e-05  3.75e-03  2.20e-01  9.99e-04   0.2
    13    -379.2946572622200847    -1.30e-05  8.03e-04  4.98e-02  7.21e-04   0.2
    14    -379.2946626618473829    -5.40e-06  4.17e-04  2.11e-02  1.05e-04   0.2
    15    -379.2946629954453783    -3.34e-07  1.04e-03  5.09e-02  4.80e-05   0.2
                        ***Gradient convergence achieved***
data() Data

Returns a pychemparse.data.Data object containing:

  • pandas.DataFrame Data with columns Iteration, Energy (Eh), Delta-E, RMSDP, MaxDP, Damp, Time(sec):
    • Time(sec) is represented as a timedelta object.

    • Energy (Eh) is represented by a pint object. Magnitude can be extracted with the .magnitude method.

  • pandas.DataFrame Comments with columns Iteration and Comment.

  • str Name of the block.

Parsed data example:

{'Data':
Iteration                  Energy (Eh)       Delta-E     RMSDP   MaxDP      MaxGrad              Time(sec)
0           1  -440.42719635301455 hartree  0.000000e+00  0.000000  0.0000   0.029500 0 days 00:00:00.500000
1           2  -440.42719635301455 hartree  0.000000e+00  0.004710  0.2320   0.029500 0 days 00:00:00.400000
2           3     -440.49687163902 hartree -6.970000e-02  0.012600  1.1300   0.011100 0 days 00:00:00.400000,

'Comments':
Iteration                                            Comment
0          1  *** Restarting incremental Fock matrix formati...
1         13         **** Energy Check signals convergence ****
2         13                    *** Unconstraining orbitals ***
3         13  *** Restarting Hessian update and switching to...
4         21  *** Restarting incremental Fock matrix formati...
5         33         **** Energy Check signals convergence ****,
'Name': 'S-O-S-C-F'}
Return type:

Data

class pychemparse.orca_elements.BlockOrcaSpectrumType(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores spectrum data from ORCA output files.

Example of ORCA Output:

-----------------------------------------------------------------------------
 ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS
-----------------------------------------------------------------------------
State   Energy    Wavelength  fosc         T2        TX        TY        TZ  
        (cm-1)      (nm)                 (au**2)    (au)      (au)      (au) 
-----------------------------------------------------------------------------
1   16903.5    591.6   0.000000000   0.00000   0.00000  -0.00000  -0.00000
2   22365.6    447.1   0.000000000   0.00000  -0.00000  -0.00000   0.00000
3   23649.8    422.8   0.000000000   0.00000   0.00000   0.00000   0.00000
4   25396.9    393.7   0.002096634   0.02718  -0.13602   0.04159   0.08336
5   26409.5    378.7   0.626104251   7.80481  -1.15488  -2.10174   1.43308
6   28468.1    351.3   0.000000000   0.00000   0.00000   0.00000  -0.00000
7   28944.3    345.5   0.000000000   0.00000   0.00000  -0.00000  -0.00000
8   28964.5    345.3   0.000000000   0.00000  -0.00000   0.00000   0.00000
9   29986.3    333.5   0.025998669   0.28543  -0.44658  -0.18107   0.23069
10   30178.3    331.4   0.000000000   0.00000   0.00000   0.00000  -0.00000
11   31055.6    322.0   0.000000000   0.00000  -0.00000  -0.00000   0.00000
12   32047.8    312.0   0.000000000   0.00000  -0.00000  -0.00000   0.00000
13   32343.8    309.2   0.000000000   0.00000  -0.00000   0.00000  -0.00000
14   32365.6    309.0   0.012474234   0.12688  -0.23853   0.23551  -0.12051
15   32454.2    308.1   0.023480417   0.23818   0.00690  -0.48392   0.06292
16   33446.2    299.0   0.001756413   0.01729  -0.06205   0.06809  -0.09382
17   34637.6    288.7   0.000000000   0.00000   0.00000  -0.00000  -0.00000
18   35255.9    283.6   0.000000000   0.00000   0.00000  -0.00000  -0.00000
...
data() Data

Parses the spectrum block and returns a Data object containing a DataFrame with units applied.

Returns

Data

The parsed data.

Parsed data example:

'Data':         Transition             Energy (eV)         Energy (cm-1)              0     0-1A -> 1-3A   2.09817 electron_volt  16922.9 / centimeter   
1     0-1A -> 2-3A  2.773662 electron_volt  22371.1 / centimeter   
2     0-1A -> 3-3A  2.932123 electron_volt  23649.2 / centimeter   
3     0-1A -> 4-1A  3.149596 electron_volt  25403.2 / centimeter   
4     0-1A -> 5-1A  3.275905 electron_volt  26422.0 / centimeter   
..             ...                     ...                   ...   
95   0-1A -> 96-1A  6.641358 electron_volt  53566.2 / centimeter   
96   0-1A -> 97-3A  6.656552 electron_volt  53688.7 / centimeter   
97   0-1A -> 98-3A  6.701102 electron_volt  54048.0 / centimeter   
98   0-1A -> 99-3A  6.727343 electron_volt  54259.7 / centimeter   
99  0-1A -> 100-1A  6.746272 electron_volt  54412.4 / centimeter   

    Wavelength (nm)  fosc(D2)                                  D2 (au**2)              0   590.9 nanometer  0.000000      0.0 bohr ** 2 * elementary_charge ** 2   
1   447.0 nanometer  0.000000      0.0 bohr ** 2 * elementary_charge ** 2   
2   422.8 nanometer  0.000000      0.0 bohr ** 2 * elementary_charge ** 2   
3   393.7 nanometer  0.002212  0.02866 bohr ** 2 * elementary_charge ** 2   
4   378.5 nanometer  0.624346  7.77921 bohr ** 2 * elementary_charge ** 2   
..              ...       ...                                         ...   
95  186.7 nanometer  0.191542   1.1772 bohr ** 2 * elementary_charge ** 2   
96  186.3 nanometer  0.000000      0.0 bohr ** 2 * elementary_charge ** 2   
97  185.0 nanometer  0.000000      0.0 bohr ** 2 * elementary_charge ** 2   
98  184.3 nanometer  0.000000      0.0 bohr ** 2 * elementary_charge ** 2   
99  183.8 nanometer  0.233894  1.41514 bohr ** 2 * elementary_charge ** 2   

                            DX (au)                            DY (au)              0        0.0 bohr * elementary_charge      -0.0 bohr * elementary_charge   
1        0.0 bohr * elementary_charge       0.0 bohr * elementary_charge   
2       -0.0 bohr * elementary_charge      -0.0 bohr * elementary_charge   
3   -0.14047 bohr * elementary_charge   0.03394 bohr * elementary_charge   
4   -1.15138 bohr * elementary_charge  -2.09801 bohr * elementary_charge   
..                                ...                                ...   
95   0.94121 bohr * elementary_charge  -0.11391 bohr * elementary_charge   
96       0.0 bohr * elementary_charge      -0.0 bohr * elementary_charge   
97       0.0 bohr * elementary_charge      -0.0 bohr * elementary_charge   
98      -0.0 bohr * elementary_charge      -0.0 bohr * elementary_charge   
99   0.60355 bohr * elementary_charge   -0.5407 bohr * elementary_charge   

                            DZ (au)  
0       0.0 bohr * elementary_charge  
1       0.0 bohr * elementary_charge  
2       0.0 bohr * elementary_charge  
3   0.08819 bohr * elementary_charge  
4   1.43244 bohr * elementary_charge  
..                               ...  
95  0.52758 bohr * elementary_charge  
96      0.0 bohr * elementary_charge  
97      0.0 bohr * elementary_charge  
98      0.0 bohr * elementary_charge  
99  0.87092 bohr * elementary_charge  
data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Identifies and separates the name, header, and body of the block.

Returns

Tuple[str, Optional[str], str]

The name of the block, the header content (or None if a header is not present), and the body of the block.

class pychemparse.orca_elements.BlockOrcaTddftExcitations(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

The block captures and stores TD-DFT excited states data for singlets from ORCA output files.

Example of ORCA Output:

--------------------------------
TD-DFT EXCITED STATES (SINGLETS)
--------------------------------
the weight of the individual excitations are printed if larger than 0.01

STATE  1:  E=   0.154808 au      4.213 eV    33976.3 cm**-1  =   0.000000
    29a ->  31a  :     0.078253
    30a ->  32a  :     0.907469

or

-------------------------
TD-DFT/TDA EXCITED STATES
-------------------------
the weight of the individual excitations are printed if larger than 1.0e-02

UHF/UKS reference: multiplicity estimated based on rounded  value, RELEVANCE IS LIMITED!

STATE  1:  E=   0.077106 au      2.098 eV    16922.9 cm**-1  =   2.000000 Mult 3
    90a ->  91a  :     0.468442 (c=  0.68442790)
    90b ->  91b  :     0.468442 (c= -0.68442790)

STATE  2:  E=   0.101930 au      2.774 eV    22371.1 cm**-1  =   2.000000 Mult 3
    89a ->  91a  :     0.418245 (c=  0.64671829)
    89a ->  92a  :     0.050001 (c= -0.22360974)
    89b ->  91b  :     0.418245 (c= -0.64671829)
    89b ->  92b  :     0.050001 (c=  0.22360974)
data() Data
Returns:

pychemparse.data.Data object that contains: - (int) states as keys, and their respective details as sub-dictionaries. The Energy (eV) values are stored as pint.Quantity. The Transitions are stored in a list, with each transition represented as a dict containing the From Orbital (str: number+a|b), To Orbital (str: number+a|b), and Coefficient (float).

Parsed data example:

{
1: {
    'Energy (eV)': <Quantity(4.647, 'electron_volt')>,
    'Transitions': [
            {'From Orbital': '29a', 'To Orbital': '32a',
                'Coefficient': 0.055845},
            {'From Orbital': '30a', 'To Orbital': '31a',
                'Coefficient': 0.906577}
        ]
    },
# Additional states follow the same structure
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaTddftExcitedStatesSinglets(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaTddftExcitations

class pychemparse.orca_elements.BlockOrcaTddftTdaExcitedStates(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaTddftExcitations

class pychemparse.orca_elements.BlockOrcaTerminatedNormally(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Termination status from ORCA output files.

Example of ORCA Output:

****ORCA TERMINATED NORMALLY****
data() Data
Returns:

pychemparse.data.Data object that contains:

  • bool Termination status

    is always True, otherwise you wound`t find this block.

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaTimingsForIndividualModules(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores CI-NEB convergence data from ORCA output files.

Example of ORCA Output:

Timings for individual modules:

Sum of individual times         ...      509.556 sec (=   8.493 min)
GTO integral calculation        ...        7.722 sec (=   0.129 min)   1.5 %
SCF iterations                  ...      123.801 sec (=   2.063 min)  24.3 %
SCF Gradient evaluation         ...       26.450 sec (=   0.441 min)   5.2 %
Geometry relaxation             ...        0.826 sec (=   0.014 min)   0.2 %
Analytical frequency calculation...      350.758 sec (=   5.846 min)  68.8 %
data() Data
Returns:

pychemparse.data.Data object that contains:

  • dict Timings

    with module names as keys and timings as datetime.timedelta objects.

Return type:

Data

Parsed data example:

'Sum of individual times': datetime.timedelta(seconds=24, microseconds=36000),
'GTO integral calculation': datetime.timedelta(seconds=8, microseconds=80000),
'SCF iterations': datetime.timedelta(seconds=15, microseconds=956000)
data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

class pychemparse.orca_elements.BlockOrcaTotalRunTime(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Total run time from ORCA output files.

Example of ORCA Output:

TOTAL RUN TIME: 0 days 0 hours 1 minutes 20 seconds 720 msec
data() Data
Returns:

pychemparse.data.Data object that contains:

  • datetime.timedelta Run Time

    representing the total run time in days, hours, minutes, seconds, and milliseconds.

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.orca_elements.BlockOrcaTotalScfEnergy(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

The block captures and stores Total SCF Energy from ORCA output files.

Example of ORCA Output:

----------------
TOTAL SCF ENERGY
----------------
Total Energy       :         -379.43011624 Eh          -10324.81837 eV

Components:
Nuclear Repulsion  :          376.82729155 Eh           10253.99191 eV
Electronic Energy  :         -756.25740779 Eh          -20578.81027 eV
One Electron Energy:        -1258.15590029 Eh          -34236.16258 eV
Two Electron Energy:          501.89849250 Eh           13657.35231 eV

Virial components:
Potential Energy   :         -757.03875139 Eh          -20600.07171 eV
Kinetic Energy     :          377.60863515 Eh           10275.25335 eV
Virial Ratio       :            2.00482373


DFT components:
N(Alpha)           :       31.000002566977 electrons
N(Beta)            :       31.000002566977 electrons
N(Total)           :       62.000005133953 electrons
E(X)               :      -51.506470961700 Eh
E(C)               :       -2.061628237949 Eh
E(XC)              :      -53.568099199649 Eh
DFET-embed. en.    :        0.000000000000 Eh
data() Data
Returns:

pychemparse.data.Data object that contains:

  • dict Total Energy with

    -pint.Quantity Value in Eh

    -pint.Quantity Value in eV

  • dict Components, Virial components, and DFT components (may differ in different versions of ORCA) with dict subdicts with data.

If data has representation in multiple units, they are stored in the subdicts with the unit as key. Othervise, the value is stored directly in the dict as pint.Quantity.

It is expected for the values to represent the same quantity, if they do not, there is an error in ORCA.

Output blocks example from ORCA 6:

{
'Total Energy': {'Value in Eh': <Quantity(-379.430116, 'hartree')>, 'Value in eV': <Quantity(-10324.8184, 'electron_volt')>},
'Components': {'Nuclear Repulsion': {'Value in Eh': <Quantity(376.827292, 'hartree')>, 'Value in eV': <Quantity(10253.9919, 'electron_volt')>}, 'Electronic Energy': {'Value in Eh': <Quantity(-756.257408, 'hartree')>, 'Value in eV': <Quantity(-20578.8103, 'electron_volt')>}, 'One Electron Energy': {'Value in Eh': <Quantity(-1258.1559, 'hartree')>, 'Value in eV': <Quantity(-34236.1626, 'electron_volt')>}, 'Two Electron Energy': {'Value in Eh': <Quantity(501.898492, 'hartree')>, 'Value in eV': <Quantity(13657.3523, 'electron_volt')>}},
'Virial components': {'Potential Energy': {'Value in Eh': <Quantity(-757.038751, 'hartree')>, 'Value in eV': <Quantity(-20600.0717, 'electron_volt')>}, 'Kinetic Energy': {'Value in Eh': <Quantity(377.608635, 'hartree')>, 'Value in eV': <Quantity(10275.2534, 'electron_volt')>}, 'Virial Ratio': 2.00482373},
'DFT components': {'N(Alpha)': <Quantity(31.0000026, 'electron')>, 'N(Beta)': <Quantity(31.0000026, 'electron')>, 'N(Total)': <Quantity(62.0000051, 'electron')>, 'E(X)': <Quantity(-51.506471, 'hartree')>, 'E(C)': <Quantity(-2.06162824, 'hartree')>, 'E(XC)': <Quantity(-53.5680992, 'hartree')>, 'DFET-embed. en.': <Quantity(0.0, 'hartree')>}
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaUnrecognizedHurray(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaHurray

class pychemparse.orca_elements.BlockOrcaUnrecognizedMessage(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaUnrecognizedNotification(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaUnrecognizedScf(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaScfType

class pychemparse.orca_elements.BlockOrcaUnrecognizedWithHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

class pychemparse.orca_elements.BlockOrcaUnrecognizedWithSingeLineHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

class pychemparse.orca_elements.BlockOrcaUses(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaVersion(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores ORCA version from ORCA output files.

Example of ORCA Output:

        Program Version 5.0.0 -  RELEASE  -
                (SVN: $Rev: 19529$)
($Date: 2021-06-28 11:36:33 +0200 (Mo, 28 Jun 2021) $)
data() Data
Returns:

pychemparse.data.Data object that contains:

  • str Version

Return type:

Data

data_available: bool = True
readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

class pychemparse.orca_elements.BlockOrcaVibrationalFrequencies(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockOrcaWithStandardHeader

The block captures and stores vibrational frequencies data from ORCA output files.

Example of ORCA Output:

-----------------------
VIBRATIONAL FREQUENCIES
-----------------------

Scaling factor for frequencies =  1.000000000  (already applied!)

0:         0.00 cm**-1
1:         0.00 cm**-1
2:         0.00 cm**-1
3:         0.00 cm**-1
4:         0.00 cm**-1
5:         0.00 cm**-1
6:       -15.28 cm**-1 ***imaginary mode***
7:        32.56 cm**-1
8:        38.76 cm**-1
9:        48.22 cm**-1
10:        89.12 cm**-1
11:       101.15 cm**-1
12:       114.47 cm**-1
13:       135.76 cm**-1
data() Data
Returns:

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.orca_elements.BlockOrcaWarnings(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

class pychemparse.orca_elements.BlockOrcaWithStandardHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

Handles blocks with a standard header format by extending the Block class.

This class is designed to process data blocks that come with a standardized header marked by lines of repeating special characters (e.g., ‘-’, ‘*’, ‘#’). It overrides the extract_name_header_and_body method to parse these headers, facilitating the separation of the block into name, header, and body components for easier readability and manipulation.

Parameters

None

Methods

extract_name_header_and_body()

Parses the block’s content to extract the name, header (if present), and body, adhering to a standard header format.

Raises

Warning

If the block’s content does not contain a recognizable header, indicating that the format may not conform to expectations.

extract_name_header_and_body() tuple[str, str | None, str]

Identifies and separates the name, header, and body of the block based on a standard header format.

Utilizes regular expressions to discern the header portion from the body, processing the header to extract a distinct name and the header content. The text following the header is treated as the body of the block.

Returns

tuple[str, str | None, str]

The name of the block, the header content (or None if a header is not present), and the body of the block.

pychemparse.gpaw_elements module

class pychemparse.gpaw_elements.AvailableBlocksGpaw

Bases: AvailableBlocksGeneral

A class to store all available blocks for GPAW.

blocks: dict[str, type[Element]] = {'BlockGpawConvergedAfter': <class 'pychemparse.gpaw_elements.BlockGpawConvergedAfter'>, 'BlockGpawDipole': <class 'pychemparse.gpaw_elements.BlockGpawDipole'>, 'BlockGpawEnergyContributions': <class 'pychemparse.gpaw_elements.BlockGpawEnergyContributions'>, 'BlockGpawIcon': <class 'pychemparse.gpaw_elements.BlockGpawIcon'>, 'BlockGpawOrbitalEnergies': <class 'pychemparse.gpaw_elements.BlockGpawOrbitalEnergies'>, 'BlockGpawTiming': <class 'pychemparse.gpaw_elements.BlockGpawTiming'>}
class pychemparse.gpaw_elements.BlockGpawConvergedAfter(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Converged after from GPAW output files.

Example of GPAW Output:

Converged after 12 iterations.
data() Data
Returns:

pychemparse.data.Data object that contains:

  • int Iterations

  • bool Converged is always True, as the block is only extracted if the calculation is converged

Parsed data example:

{'Iterations': 12, 'Converged': True}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.gpaw_elements.BlockGpawDipole(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Dipole from GPAW output files.

Example of GPAW Output:

Dipole moment: (-0.000000, 0.000000, -1.948262) |e|*Ang
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pint.Quantity Dipole Moment in |e|*Ang. Can be converted to Debye with .to('D').

Parsed data example:

{'Dipole Moment': <Quantity([ 0.       -0.       -1.128191], 'angstrom * elementary_charge')>}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.gpaw_elements.BlockGpawEnergyContributions(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Energy contributions from GPAW output files.

Example of GPAW Output:

Energy contributions relative to reference atoms: (reference = -10231.780790)

Kinetic:       +111.119958
Potential:     -114.654058
External:        +0.000000
XC:             -93.096053
Entropy (-ST):   +0.000000
Local:           +0.390037
--------------------------
Free energy:    -96.240117
Extrapolated:   -96.240117
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pint.Quantity Reference in eV

  • pint.Quantity Free energy in eV

  • pint.Quantity Extrapolated in eV

  • dict Contributions with pint.Quantity’s. Data is in eV

Parsed data example:

{'Contributions': {'Kinetic': <Quantity(106.291868, 'electron_volt')>, 
                   'Potential': <Quantity(-113.401291, 'electron_volt')>,
                   'External': <Quantity(0.0, 'electron_volt')>,
                   'XC': <Quantity(-93.210989, 'electron_volt')>,
                   'Entropy (-ST)': <Quantity(0.0, 'electron_volt')>, 
                   'Local': <Quantity(0.39059, 'electron_volt')>},
'Reference': <Quantity(-10231.7808, 'electron_volt')>,
'Free energy': <Quantity(-99.929821, 'electron_volt')>,
'Extrapolated': <Quantity(-99.929821, 'electron_volt')>
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.gpaw_elements.BlockGpawIcon(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

data() Data

Icon is icon, noting to extract except for the ascii symbols

data_available: bool = True
extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

readable_name() str

Generate a readable name for the block based on its content.

Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.

Returns:

The extracted name of the block.

Return type:

str

class pychemparse.gpaw_elements.BlockGpawOrbitalEnergies(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Orbitals from GPAW output files.

Example of GPAW Output:

                        Up                     Down
Band  Eigenvalues  Occupancy  Eigenvalues  Occupancy
    0    -24.42908    1.00000    -24.57211    1.00000
    1    -22.16252    1.00000    -22.18228    1.00000
    2    -21.55401    1.00000    -21.60131    1.00000
data() Data
Returns:

pychemparse.data.Data object that contains:

  • pandas.DataFrame UpDownOrbitals with columns: Band, Eigenvalues_Up, Occupancy_Up, Eigenvalues_Down, Occupancy_Down. Eigenvalues are in eV.

Parsed data example:

{'UpDownOrbitals':      Band  Eigenvalues_Up  Occupancy_Up  Eigenvalues_Down  Occupancy_Down
0       0       -24.42908           1.0         -24.57211             1.0
1       1       -22.16252           1.0         -22.18228             1.0
2       2       -21.55401           1.0         -21.60131             1.0
3       3       -19.15063           1.0         -19.19201             1.0
4       4       -19.10920           1.0         -19.10168             1.0
..    ...             ...           ...               ...             ...
247   247        81.59782           0.0          81.62746             0.0
248   248        81.85757           0.0          81.83158             0.0
249   249        83.60243           0.0          83.51849             0.0
250   250        87.94628           0.0          87.90765             0.0
251   251        95.86929           0.0          95.86901             0.0

[252 rows x 5 columns]}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.gpaw_elements.BlockGpawTiming(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores Timing from GPAW output files.

Example of GPAW Output:

        Timing:                               incl.     excl.
------------------------------------------------------------
Basic WFS set positions:              0.000     0.000   0.0% |
Redistribute:                        0.000     0.000   0.0% |
Basis functions set positions:        0.003     0.003   0.0% |
...
ST tci:                               0.001     0.001   0.0% |
Set symmetry:                         0.000     0.000   0.0% |
TCI: Evaluate splines:                0.182     0.182   2.4% ||
mktci:                                0.001     0.001   0.0% |
Other:                                0.803     0.803  10.5% |---|
------------------------------------------------------------
Total:                                          7.661 100.0%
data() Data

Parses the timing data, maintains the hierarchy, and extracts the total time separately.

Returns:

pychemparse.data.Data object that contains:

  • Total: A dictionary with ‘Total Time’ and ‘Percentage’.

  • TimingHierarchy: A list of timing data entries maintaining the hierarchy.

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

pychemparse.vasp_elements module

class pychemparse.vasp_elements.AvailableBlocksVasp

Bases: AvailableBlocksGeneral

A class to store all available blocks for Wasp.

blocks: dict[str, type[Element]] = {'BlockVaspFreeEnergyOfTheIonElectronSystem': <class 'pychemparse.vasp_elements.BlockVaspFreeEnergyOfTheIonElectronSystem'>, 'BlockVaspGeneralTiming': <class 'pychemparse.vasp_elements.BlockVaspGeneralTiming'>, 'BlockVaspWarning': <class 'pychemparse.vasp_elements.BlockVaspWarning'>, 'BlockVaspWithSingleLineHeader': <class 'pychemparse.vasp_elements.BlockVaspWithSingleLineHeader'>, 'BlockVaspWithStandardHeader': <class 'pychemparse.vasp_elements.BlockVaspWithStandardHeader'>}
class pychemparse.vasp_elements.BlockVaspFreeEnergyOfTheIonElectronSystem(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockVaspWithSingleLineHeader

The block captures and stores TD-DFT excited states data for singlets from VASP output files.

Example of VASP Output:

Free energy of the ion-electron system (eV)
---------------------------------------------------
alpha Z        PSCENC =       856.26359874
Ewald energy   TEWEN  =    124561.82273922
-Hartree energ DENC   =   -158586.56090100
-exchange      EXHF   =         0.00000000
-V(xc)+E(xc)   XCENC  =      1621.64044307
PAW double counting   =     40935.10832877   -40536.82457645
entropy T*S    EENTRO =        -0.11542442
eigenvalues    EBANDS =     -6251.33904632
atomic energy  EATOM  =     37032.80098409
Solvation  Ediel_sol  =         0.00000000
---------------------------------------------------
free energy    TOTEN  =      -367.20385430 eV

energy without entropy =     -367.08842988  energy(sigma->0) =     -367.14614209
data() Data
Returns:

pychemparse.data.Data object that contains: - pint.Quantity’s for energy components in eV - tuple’s of pint.Quantity’s for PAW double counting in eV if present in the block

Parsed data example:

{'alpha Z        PSCENC': 856.26359874 <Unit('electron_volt')>,
'Ewald energy   TEWEN': 124531.99989886 <Unit('electron_volt')>,
'-Hartree energ DENC': -158146.11578475 <Unit('electron_volt')>,
'-exchange      EXHF': 0.0 <Unit('electron_volt')>,
'-V(xc)+E(xc)   XCENC': 1631.52209578 <Unit('electron_volt')>,
'PAW double counting': (29408.33949787 <Unit('electron_volt')>,
-29013.20232444 <Unit('electron_volt')>),
'entropy T*S    EENTRO': -0.07591362 <Unit('electron_volt')>,
'eigenvalues    EBANDS': -1504.38797381 <Unit('electron_volt')>,
'atomic energy  EATOM': 37032.80098409 <Unit('electron_volt')>,
'Solvation  Ediel_sol': 0.0 <Unit('electron_volt')>,
'free energy    TOTEN': 4797.14407872 <Unit('electron_volt')>,
'energy without entropy': 4797.21999234 <Unit('electron_volt')>,
'energy(sigma->0)': 4797.18203553 <Unit('electron_volt')>
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.vasp_elements.BlockVaspGeneralTiming(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores the Timings for the VASP output files.

Example of VASP Output:

General timing and accounting informations for this job:
========================================================

          Total CPU time used (sec):     1410.943
                    User time (sec):     1394.056
                  System time (sec):       16.888
                 Elapsed time (sec):     1460.875

           Maximum memory used (kb):      201324.
           Average memory used (kb):          N/A

                  Minor page faults:       310377
                  Major page faults:          212
         Voluntary context switches:         5646
data() Data
Returns:

pychemparse.data.Data object that contains: - datetime.timedelta’s for time components in seconds - bitmath.Byte’s for memory components in bytes - bitmath.kB’s for memory components in kilobytes - bitmath.MB’s for memory components in megabytes - bitmath.GB’s for memory components in gigabytes - pint.Quantity’s for other components with units - N/A for non-applicable values - str for other values

Parsed data example:

{'Total CPU time used': datetime.timedelta(seconds=1410, microseconds=943000),
'User time': datetime.timedelta(seconds=1394, microseconds=56000),
'System time': datetime.timedelta(seconds=16, microseconds=888000),
'Elapsed time': datetime.timedelta(seconds=1460, microseconds=875000),
'Maximum memory used': kB(201324.0),
'Average memory used': 'N/A',
'Minor page faults': '310377',
'Major page faults': '212',
'Voluntary context switches': '5646'
}

Return type:

Data

data_available: bool = True

Formatted data is available for this block.

class pychemparse.vasp_elements.BlockVaspWarning(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

The block captures and stores waning messages from VASP output files.

Example of VASP Output:

 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     For optimal performance we recommend to set                             |
|       NCORE = 2 up to number-of-cores-per-socket                            |
|     NCORE specifies how many cores store one orbital (NPAR=cpu/NCORE).      |
|     This setting can greatly improve the performance of VASP for DFT.       |
|     The default, NCORE=1 might be grossly inefficient on modern             |
|     multi-core architectures or massively parallel machines. Do your        |
|     own testing! More info at https://www.vasp.at/wiki/index.php/NCORE      |
|     Unfortunately you need to use the default for GW and RPA                |
|     calculations (for HF NCORE is supported but not extensively tested      |
|     yet).                                                                   |
|                                                                             |
-----------------------------------------------------------------------------
extract_name_header_and_body() tuple[str, str | None, str]

Extract the block’s name, header, and body components.

Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.

Returns:

A tuple containing the block’s name, optional header, and body content.

Return type:

tuple[str, str | None, str]

class pychemparse.vasp_elements.BlockVaspWithSingleLineHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: BlockVaspWithStandardHeader

class pychemparse.vasp_elements.BlockVaspWithStandardHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)

Bases: Block

Handles blocks with a standard header format by extending the Block class.

This class is designed to process data blocks that come with a standardized header marked by lines of repeating special characters (e.g., ‘-’, ‘*’, ‘#’). It overrides the extract_name_header_and_body method to parse these headers, facilitating the separation of the block into name, header, and body components for easier readability and manipulation.

Parameters

None

Methods

extract_name_header_and_body()

Parses the block’s content to extract the name, header (if present), and body, adhering to a standard header format.

Raises

Warning

If the block’s content does not contain a recognizable header, indicating that the format may not conform to expectations.

extract_name_header_and_body() tuple[str, str | None, str]

Identifies and separates the name, header, and body of the block based on a standard header format.

Utilizes regular expressions to discern the header portion from the body, processing the header to extract a distinct name and the header content. The text following the header is treated as the body of the block.

Returns

tuple[str, str | None, str]

The name of the block, the header content (or None if a header is not present), and the body of the block.

pychemparse.regex_request module

class pychemparse.regex_request.RegexRequest(p_type: str, p_subtype: str, pattern: str, flags: list[str], comment: str = '')

Bases: object

Encapsulates a regular expression request for parsing structured text.

This class defines a regular expression pattern along with associated metadata to identify and extract specific elements from text. It allows for the application of the regex pattern to text segments, facilitating the extraction of structured information based on the pattern.

Variables:
  • p_type (str) – The general type of the regex request, often corresponding to a high-level category such as ‘Block’ or ‘Element’.

  • p_subtype (str) – A more specific identifier within the broader type, providing additional context or classification.

  • pattern (str) – The actual regular expression pattern used for matching text.

  • flags (int) – The combined regex flags compiled into an integer, determining how the regex pattern is applied.

  • comment (str) – An optional description or note about the purpose or nature of the regex request.

apply(marked_text: list[tuple[tuple[int, int], tuple[int, int], Element]] | str, mode: str = 'ORCA', show_progress: bool = False) tuple[str, dict[str, dict]]

Applies the regex pattern to marked text or a raw string to identify and extract elements based on the pattern.

This method iterates over the marked text or processes a string to identify matches to the regex pattern. Extracted elements are then organized based on their positions within the text.

Parameters:
  • marked_text (Union[list[tuple[tuple[int, int], tuple[int, int], Element]], str]) – The marked text or raw string to which the regex pattern will be applied. Marked text should be a list of tuples, each containing character positions, line numbers, and an associated Element. If a raw string is provided, it will be converted to the marked text.

  • mode (str) – The operational mode for element extraction, typically indicating the type of data being processed (e.g., ‘ORCA’ or ‘GPAW’).

  • show_progress (bool) – Indicates whether a progress indicator should be shown during the extraction process. Useful for long-running operations.

Returns:

A tuple containing the updated marked text and a dictionary mapping extracted elements to their positions.

Return type:

tuple[str, dict[str, dict]]

compile() Pattern

Compiles the regex pattern with the specified flags into a regex pattern object.

This compiled object can be used for various regex operations like findall, search, match, etc., enabling efficient pattern matching.

Returns:

A compiled regex pattern object, ready for use in pattern matching operations.

Return type:

Pattern

to_dict() dict[str, str | list[str]]

Converts the RegexRequest instance to a dictionary, including flag names as strings.

Returns:

A dictionary representation of the RegexRequest, with keys for type, subtype, pattern, flags (as a list of strings), and optional comment.

Return type:

dict[str, Union[str, list[str]]]

validate_configuration() None

Validates the RegexRequest configuration. Placeholder for future validation logic.

Currently, this method does not perform any checks and exists as a placeholder for potential future validation requirements.

pychemparse.regex_settings module

pychemparse.regex_settings.DEFAULT_GPAW_REGEX_FILE = '/home/runner/work/ChemParse/ChemParse/pychemparse/gpaw_regex.json'

Path to the default GPAW regex settings JSON file, included with the package. :type: str

pychemparse.regex_settings.DEFAULT_GPAW_REGEX_SETTINGS = RegexSettings(Order: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'], Items: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'])

The pre-loaded RegexSettings instance containing the default regex patterns for GPAW output parsing. :type: RegexSettings

pychemparse.regex_settings.DEFAULT_ORCA_REGEX_FILE = '/home/runner/work/ChemParse/ChemParse/pychemparse/orca_regex.json'

Path to the default ORCA regex settings JSON file, included with the package. :type: str

pychemparse.regex_settings.DEFAULT_ORCA_REGEX_SETTINGS = RegexSettings(Order: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'], Items: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'])

The pre-loaded RegexSettings instance containing the default regex patterns for ORCA output parsing. :type: RegexSettings

pychemparse.regex_settings.DEFAULT_VASP_REGEX_FILE = '/home/runner/work/ChemParse/ChemParse/pychemparse/vasp_regex.json'

Path to the default VASP regex settings JSON file, included with the package. :type: str

pychemparse.regex_settings.DEFAULT_VASP_REGEX_SETTINGS = RegexSettings(Order: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'], Items: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'])

The pre-loaded RegexSettings instance containing the default regex patterns for VASP output parsing. :type: RegexSettings

class pychemparse.regex_settings.RegexBlueprint(order: list[str], pattern_structure: dict[str, str], pattern_texts: dict[str, str], comment: str)

Bases: object

A class representing a blueprint for generating multiple RegexRequest objects with a shared structure.

The RegexBlueprint class is useful for defining a common structure for multiple regex patterns that share a similar format. By defining a blueprint with a pattern structure and a set of pattern texts, you can generate multiple RegexRequest objects with the same structure but different text snippets. This is particularly useful when you have a set of related patterns that follow a consistent format but differ in specific details.

Parameters:
  • order (list[str]) – The ordered list of keys that defines the sequence of generated RegexRequest objects.

  • pattern_structure (dict[str, str]) – A dictionary defining the common structure of regex patterns, including beginning, ending, and flags keys.

  • pattern_texts (dict[str, str]) – A dictionary mapping each key in the order list to a specific text snippet to be inserted into the pattern structure.

  • comment (str) – A comment or description associated with this blueprint.

Attributes

order: list[str]

The ordered list of keys that defines the sequence of generated RegexRequest objects.

pattern_structure: dict[str, str]

A dictionary defining the common structure of regex patterns, including ‘beginning’, ‘ending’, and ‘flags’ keys.

pattern_texts: dict[str, str]

A dictionary mapping each key in the ‘order’ list to a specific text snippet to be inserted into the pattern structure.

comment: str

A comment or description associated with this blueprint.

Examples

import pychemparse as chp

# Define a blueprint for extracting multiple blocks of text from an ORCA output file
rb = chp.RegexBlueprint(
    order=[
        "BlockOrcaVersion",
        "BlockOrcaContributions",
    ],
    pattern_structure={
        "beginning": r"^([ \t]*",
        "ending": r".*?\n(?:^(?!^[ \t]*[\-\*\#\=]{5,}.*$|^[ \t]*$).*\n)*)",
        "flags": ["MULTILINE"],
    },
    pattern_texts={
        "BlockOrcaVersion": "Program Version",
        "BlockOrcaContributions": "With contributions from"
    },
    comment="Blueprint: Paragraph with the line that starts with specified text.",
)

# Generate a list of RegexRequest objects based on the blueprint
regex_requests = rb.to_list()

# Validate the blueprint configuration
rb.validate_configuration()
add_item(name: str, pattern_text: str) None

Adds a new item to the blueprint, updating the pattern texts, items, and order accordingly.

Parameters:
  • name (str) – The key for the new pattern text.

  • pattern_text (str) – The text snippet to be inserted into the pattern structure for the new item.

to_dict() dict[str, list[str] | dict[str, str | list[str]] | str]

Converts the RegexBlueprint instance into a dictionary representation.

Returns:

A dictionary containing the blueprint’s ordered keys, pattern structure, pattern texts, and an optional comment.

Return type:

dict[str, list[str] | dict[str, str | list[str]] | str]

to_list() list[RegexRequest]

Converts the blueprint’s items into a list of RegexRequest objects ordered according to the blueprint’s order attribute.

Returns:

An ordered list of RegexRequest objects generated from the blueprint.

Return type:

list[RegexRequest]

tree(depth: int = 0) str

Generates a tree-like string representation of the blueprint, illustrating the hierarchy of pattern structures and texts.

Parameters:

depth (int) – The initial indentation depth, used to represent hierarchical levels in the output string. The root level starts at 0.

Returns:

A string visualization of the blueprint, with patterns and texts formatted in a hierarchical, tree-like structure.

Return type:

str

validate_configuration() None

Checks the blueprint’s configuration for consistency and correctness, ensuring all necessary components are correctly defined.

This includes verifying the presence of each item in the order within the pattern_texts, the inclusion of required keys in the pattern_structure, the validity of specified regex flags, and the correctness of the regex pattern.

Raises:

ValueError – If any aspect of the blueprint’s configuration is found to be inconsistent or incorrect.

class pychemparse.regex_settings.RegexSettings(settings_file: str | None = None, items: dict[str, RegexRequest | RegexSettings] | None = None, order: list[str] | None = None)

Bases: object

Manages a collection of regex patterns and settings, supporting hierarchical organization and JSON-based configuration.

This class facilitates the organization, storage, and retrieval of regex patterns and their configurations. It can be directly instantiated with regex patterns and an execution order or loaded from a JSON file containing the configurations.

Variables:
  • items (dict[str, RegexRequest | RegexSettings]) – A mapping from names to RegexRequest objects or nested RegexSettings, representing individual regex patterns or groups of patterns.

  • order (list[str]) – The order in which the regex patterns or groups should be applied or processed.

add_item(name: str, item: RegexRequest | RegexSettings, rewrite: bool = False) None

Adds a new regex pattern or settings group to the RegexSettings instance.

Parameters:
  • name (str) – The unique name/key for the new item.

  • item (Union[RegexRequest, RegexSettings]) – The RegexRequest or RegexSettings instance to be added.

  • rewrite (bool, optional) – If True, an existing item with the same name will be overwritten. Defaults to False.

Raises:

ValueError – If an item with the same name already exists and rewrite is False.

get_ordered_items() list[RegexRequest | RegexSettings]

Retrieves the regex items in the order specified by self.order.

Returns:

An ordered list of RegexRequest objects and/or RegexSettings instances.

Return type:

list[RegexRequest | RegexSettings]

Raises:

ValueError – If the order list references names not present in the items dictionary.

items: dict[str, RegexRequest | RegexSettings]
load_settings(settings_file: str) None

Populates the RegexSettings instance with configurations from a specified JSON file.

Parameters:

settings_file (str) – The file path to the JSON file containing regex configurations.

order: list[str]
parse_settings(settings: dict[str, dict | list[str]]) None

Parses a settings dictionary to populate the RegexSettings instance with RegexRequest, RegexBlueprint, or nested RegexSettings.

Parameters:

settings (dict[str, dict| list[str]]) – A dictionary containing the configuration for regex patterns. It may define RegexRequest objects directly, specify RegexBlueprint configurations, or contain nested RegexSettings.

save_as_json(filename: str) None

Exports the RegexSettings configuration to a JSON file, preserving the nested structure and order of regex patterns.

Parameters:

filename (str) – The file path where the JSON representation of the regex settings should be saved.

set_order(order: list[str]) None

Defines the processing order for the regex items within this RegexSettings instance.

Parameters:

order (list[str]) – The sequence of item names, determining the order in which items are processed.

Raises:

ValueError – If any name in the provided order does not correspond to an existing item in self.items.

to_dict() dict[str, dict | list[str]]

Serializes the RegexSettings instance, including its nested structure, into a dictionary format suitable for JSON serialization.

Returns:

A dictionary representation of the RegexSettings instance, capturing the order of items and the nested regex configurations.

Return type:

dict[str, dict| list[str]]

to_list() list[RegexRequest | RegexSettings]

Converts the RegexSettings instance to a flattened list of RegexRequest objects, including those from nested RegexSettings.

Returns:

A list containing all RegexRequest objects and RegexSettings instances, expanded in order.

Return type:

list[RegexRequest | RegexSettings]

Raises:

TypeError – If an item within self.items is neither a RegexRequest nor a RegexSettings instance.

tree(depth: int = 0) str

Visualizes the regex settings hierarchy as a tree-like structure, showing the nested organization of patterns and groups.

Parameters:

depth (int) – The initial indentation level, used to visually represent the depth of nested structures.

Returns:

A string visualization of the settings hierarchy, formatted as an indented tree structure.

Return type:

str

validate_configuration() None

Ensures that each item listed in the order is present in the items dictionary and validates nested configurations.

Raises:
  • ValueError – If an ordered item is missing from the items dictionary.

  • RuntimeWarning – If there are items not included in the order.

pychemparse.scripts module

pychemparse.scripts.chem_parse(input_file: str, output_file: str, file_format: str = 'auto', readable_name: str | None = None, raw_data_substrings: list[str] = [], raw_data_not_substrings: list[str] = [], mode: str = 'ORCA')

Parses an ORCA (or GPAW) output file and exports filtered data to the specified format.

This function supports exporting to CSV, JSON, HTML, and Excel formats. The output format can be auto-detected based on the file extension of the output path. Data can be filtered by readable names or the presence/absence of specific substrings in the raw data.

Parameters:
  • input_file (str) – The path to the ORCA output file to be processed.

  • output_file (str) – The file path where the exported data will be saved.

  • file_format (str, optional) – The desired output format (‘auto’, ‘csv’, ‘json’, ‘html’, ‘xlsx’). If ‘auto’, the format is inferred from the output file extension.

  • readable_name (Optional[str], optional) – Filters elements by their readable name, if specified.

  • raw_data_substrings (list[str], optional) – Filters elements containing these substrings in their raw data.

  • raw_data_not_substrings (list[str], optional) – Filters elements not containing these substrings in their raw data.

  • mode (str, optional) – Specifies the mode of the input file, which can be ‘ORCA’, ‘GPAW’ or ‘VASP’. Default is ‘ORCA’.

pychemparse.scripts.chem_parse_cli()

Command-line interface for the chem_parse function, allowing users to export data from an ORCA output file from the terminal.

This CLI provides options for specifying the input and output file paths, the desired output format, filtering criteria based on readable names and raw data substrings, and the processing mode.

pychemparse.scripts.chem_to_html(input_file: str, output_file: str, insert_css: bool = True, insert_js: bool = True, insert_left_sidebar: bool = True, insert_colorcomment_sidebar: bool = True, mode: str = 'ORCA') None

Converts an ORCA (or GPAW) output file to an HTML document with various optional features like CSS, JavaScript, and sidebars.

Parameters:
  • input_file (str) – The path to the input file, typically an ORCA output file.

  • output_file (str) – The destination path where the HTML file will be saved.

  • insert_css (bool, optional) – If True, includes default CSS styles in the HTML output.

  • insert_js (bool, optional) – If True, includes JavaScript for interactive elements in the HTML output.

  • insert_left_sidebar (bool, optional) – If True, adds a left sidebar for navigation in the HTML output.

  • insert_colorcomment_sidebar (bool, optional) – If True, adds a sidebar for color-coded comments in the HTML output.

  • mode (str, optional) – Specifies the processing mode, which can be ‘ORCA’, ‘GPAW’ or ‘VASP’. Default is ‘ORCA’.

pychemparse.scripts.chem_to_html_cli() None

CLI entry point for converting an ORCA or GPAW output file to an HTML document. Parses command-line arguments for input and output file paths and optional features.

This function facilitates the use of the conversion utility from the command line, allowing users to specify the input and output files as well as toggle optional features like CSS, JavaScript, and sidebars via command-line flags.

pychemparse.units_and_constants module

pychemparse.units_and_constants.ureg = <pint.registry.UnitRegistry object>

Unit registry for pint, use this to define units and constants, do not create a new one