API Documentation
Submodules
pychemparse.data module
- class pychemparse.data.Data(data: dict | None = None, comment: str = '')
Bases:
object
A dictionary-like class with an additional comment field. It is designed to return False if the dictionary is empty, even if a comment is present.
- Parameters:
data (dict, optional) – The main data dictionary to store the parsed data. Defaults to an empty dictionary if None is provided. An error is raised if the input is not a dictionary.
comment (str, optional) – An optional comment about the data. Defaults to an empty string.
- Variables:
data (dict) – The main data dictionary where parsed data is stored. This attribute is initialized with the data parameter.
comment (str) – A comment about the data. This attribute is initialized with the comment parameter.
- clear() None
Remove all items from the dictionary.
- copy() dict
Return a shallow copy of the dictionary.
- Returns:
A shallow copy of the dictionary.
- Return type:
dict
- get(key: str, default: any | None = None) any
Safely retrieve an item by key, returning a default value if the key does not exist.
- Parameters:
key (str) – The key of the item to retrieve.
default – The default value to return if the key does not exist. Defaults to None.
- Returns:
The value associated with the key, or the default value.
- Return type:
any
- items() ItemsView[str, any]
Return a view object that displays a list of the dictionary’s key-value tuple pairs.
- Returns:
A view object displaying the dictionary’s key-value pairs.
- Return type:
ItemsView[str, any]
- keys() KeysView[str]
Return a view object that displays a list of the dictionary’s keys.
- Returns:
A view object displaying the dictionary’s keys.
- Return type:
KeysView[str]
- pop(key: str, default=None) any
Remove the specified key and return the corresponding value. If the key is not found, default is returned if provided, otherwise KeyError is raised.
- Parameters:
key (str) – The key to remove and return its value.
default – The value to return if the key is not found. Defaults to None.
- Returns:
The value for the key if the key is in the dictionary, else default.
- Return type:
any
- popitem() tuple[str, any]
Remove and return a (key, value) pair from the dictionary in LIFO order. Raises KeyError if the dictionary is empty.
- Returns:
The removed (key, value) pair.
- Return type:
tuple[str, any]
- setdefault(key: str, default=None) any
Return the value of the key if it is in the dictionary, otherwise insert it with a default value.
- Parameters:
key (str) – The key to check or insert in the dictionary.
default – The value to set if the key is not already in the dictionary. Defaults to None.
- Returns:
The value for the key if the key is in the dictionary, else default.
- Return type:
any
- update(*args, **kwargs) None
Update the dictionary with the key/value pairs from other, overwriting existing keys.
- Parameters:
args – A dictionary or an iterable of key/value pairs (as tuples or other iterables of length two).
kwargs – Additional key/value pairs to update the dictionary with.
- values() ValuesView[any]
Return a view object that displays a list of the dictionary’s values.
- Returns:
A view object of the dictionary’s values.
- Return type:
ValuesView[any]
pychemparse.elements module
- class pychemparse.elements.AvailableBlocksGeneral
Bases:
object
Manages a registry of different types of block elements within a structured document.
This class provides a dynamic registry for block types, allowing for modular extension of block element capabilities. New block classes can be registered to the system using the class methods provided, enhancing the system’s modularity and extensibility.
- Variables:
blocks (dict[str, type[Element]]) – A mapping of block names to their corresponding block class definitions.
- classmethod register_block(block_cls: type[Element]) type[Element]
Registers a new block type in the blocks registry.
This method acts as a decorator for registering block classes. It raises a ValueError if a block class with the same name is already registered, preventing unintentional overwrites.
- classmethod rewrite_block(block_cls: type[Element]) type[Element]
Registers or redefines a block type in the blocks registry.
Unlike register_block, this method allows the redefinition of existing block types by overwriting them if necessary. It is used when an update or replacement for an existing block definition is required.
- class pychemparse.elements.Block(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Element
Represents a complex data block within a structured document.
Extends the Element class to encapsulate a more structured unit of data, potentially including identifiable components such as a name, header, and body. It provides methods to extract and present these components, with a default implementation for name extraction.
- Variables:
data_available (bool) – Indicates whether the block contains extractable data. Defaults to False.
position (tuple | None) – The position of the block within the larger document structure, often expressed as a range of line numbers.
specified_class_name – A placeholder for the block’s subtype if it cannot be determined during processing. Defaults to None.
- body() str
Retrieve the body content of the block.
Utilizes the extract_name_header_and_body method to extract the body of the block, which contains the main content.
- Returns:
The body content of the block.
- Return type:
str
- static body_preformat(body_raw: str) str
Format the raw body content for HTML display.
This static method wraps the raw body text in HTML <pre> tags to enhance its presentation in HTML format.
- Parameters:
body_raw (str) – The raw text of the body.
- Returns:
The formatted body text, suitable for HTML display.
- Return type:
str
- data_available: bool = False
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- header() str | None
Retrieve the block’s header, if it exists.
Uses the extract_name_header_and_body method to determine the presence and content of a header within the block.
- Returns:
The block’s header if present, otherwise None.
- Return type:
str | None
- static header_preformat(header_raw: str) str
Format the raw header content for HTML display.
This static method wraps the raw header text in HTML <pre> tags to enhance its presentation in HTML format.
- Parameters:
header_raw (str) – The raw text of the header.
- Returns:
The formatted header text, suitable for HTML display.
- Return type:
str
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- specified_class_name: str | None = None
- to_html() str
Generate an HTML representation of the block.
Constructs an HTML structure for the block, incorporating the name, header (if present), and body. The depth of the block within the document structure influences the header’s HTML level.
- Returns:
A string containing the HTML representation of the block, with header and body sections formatted and wrapped in appropriate HTML tags.
- Return type:
str
- class pychemparse.elements.BlockUnknown(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
Represents a block of an unrecognized or unknown type within a structured document.
This class is used as a fallback for blocks that do not match any of the registered block types, allowing for generic handling of unknown or unstructured data.
- data() Data
Warns about the unstructured nature of the block and returns its raw data encapsulated in a Data instance.
This method is called when attempting to process an unknown block type, issuing a warning about the lack of a structured extraction process and suggesting contributions for handling such blocks.
- Returns:
A Data instance containing the block’s raw data and a comment about its unstructured nature.
- Return type:
- class pychemparse.elements.Element(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
object
Represents a basic element within a structured document, serving as a fundamental unit of data.
An Element encapsulates raw data, positional information, and provides methods for data extraction and presentation. It acts as a base class for more specialized elements tailored to specific data types or structures within a document.
- Parameters:
raw_data (str) – The raw text data associated with the element.
char_position (tuple[int, int] | None, optional) – The character position range (start, end) of the element within the larger data structure, if applicable.
line_position (tuple[int, int] | None, optional) – The line position range (start, end) of the element within the larger data structure, if applicable.
- Variables:
raw_data (str) – The raw text data associated with this element.
char_position (tuple[int, int] | None) – The character position range of the element within the data structure, or None.
line_position (tuple[int, int] | None) – The line position range of the element within the data structure, or None.
- data() Data
Process the raw data of the element to extract meaningful information.
This method is designed to be overridden by subclasses to implement specific data extraction logic tailored to the element’s structure and content.
- Returns:
An instance of the Data class containing ‘raw data’ as its content, accompanied by a comment indicating the absence of specific data extraction procedures.
- Return type:
- Raises:
Warning – Indicates that no specific procedure for analyzing the data was implemented.
- static data_preformat(data_raw: str) str
Format the raw data for HTML display.
This static method wraps the raw data in HTML <pre> tags for better readability when displayed as HTML.
- Parameters:
data_raw (str) – The raw text to be formatted.
- Returns:
The formatted text wrapped in HTML <pre> tags.
- Return type:
str
- depth() int
Calculate the depth of nested structures within the element.
This method computes the maximum depth of nested lists representing the hierarchical structure of the element, indicating the complexity of its structure.
- Returns:
The maximum depth of the element’s nested list structure.
- Return type:
int
- get_structure() dict[Self, tuple | None]
Retrieve the structural representation of the element as a nested dictionary.
This method provides a way to represent the hierarchical relationships within data, where each element can contain nested sub-elements.
- Returns:
A dictionary with the element itself as the key and an empty tuple as the value, indicating no nested structure by default.
- Return type:
dict[Self, tuple | None]
- static max_depth(d) int
Compute the maximum depth of a nested list structure.
This utility method assists in determining the complexity of an element’s structural hierarchy by calculating the depth of nested lists.
- Parameters:
d (list | dict) – A nested list or dictionary representing the structure of an element or a complex data structure.
- Returns:
The maximum depth of the nested list or dictionary structure.
- Return type:
int
- static process_invalid_name(input_string: str) str
Clean and process an input string to generate a valid name or identifier.
This method sanitizes input strings that may contain invalid characters or formatting, ensuring the output is suitable for use as a name or identifier. It handles strings without letters by labeling them as “Unknown” and removes non-alphabetic characters from other strings.
- Parameters:
input_string (str) – The input string to be processed.
- Returns:
A cleaned and possibly truncated version of the input string, made suitable for use as a name or identifier.
- Return type:
str
- readable_name() None
Generate a readable name for the element based on its data.
This method is intended to be overridden by subclasses to provide a meaningful, human-readable name derived from the element’s content.
- Returns:
None by default, indicating the method has not been implemented. Subclasses should override this method.
- Return type:
None
- to_html() str
Generate an HTML representation of the element.
This method provides a basic HTML structure for displaying the element’s data. Subclasses may override this method to provide more specialized HTML representations tailored to the element’s specific characteristics.
- Returns:
A string containing the HTML representation of the element, incorporating the preformatted raw data.
- Return type:
str
- exception pychemparse.elements.ExtractionError
Bases:
Exception
Custom exception class for errors encountered during energy extraction processes.
This exception is raised when there is a problem with extracting energy-related data from a given source or dataset.
- class pychemparse.elements.Spacer(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Element
- data() None
Indicate that no data is associated with a Spacer element.
Overrides the data method from the Element class to return None, reflecting the intended use of a Spacer as a representation of empty space or a separator without meaningful data.
- Returns:
None, indicating the absence of data.
- Return type:
None
- static data_preformat(data_raw: str) str
Format raw Spacer content for HTML display by replacing newlines with HTML line breaks.
- Parameters:
data_raw (str) – The raw text content of the spacer.
- Returns:
The formatted content with newlines converted to HTML line breaks.
- Return type:
str
pychemparse.file module
- class pychemparse.file.File(file_path: str, regex_settings: RegexSettings | None = None, mode: str | None = 'ORCA')
Bases:
object
Manages the processing of a file within the ChemParser framework.
This class is responsible for parsing a given file, identifying and extracting elements based on predefined regex patterns, and facilitating the generation of an HTML representation of the file’s content.
- Variables:
file_path (str) – The path to the input file being processed.
regex_settings (RegexSettings) – The regex settings utilized for pattern processing within the file.
initialized (bool) – A flag indicating whether the instance has been properly initialized with file content and regex settings.
original_text (str) – The original textual content read from the file.
_blocks (pd.DataFrame) – A DataFrame storing the processed elements identified within the file.
_marked_text (list[tuple[tuple[int, int], tuple[int, int], str | Element]]) – A list of marked text segments, each containing character and line positions alongside the corresponding text or Element object.
mode (str) – The operational mode of the file, which may affect regex settings and processing behavior. Common modes include ‘ORCA’, ‘GPAW’ and ‘VASP’.
- Parameters:
file_path (str) – The path to the file to be processed.
regex_settings (Optional[RegexSettings], optional) – Custom regex settings for pattern processing. If not provided, default settings based on the specified mode will be used.
mode (Optional[str], optional) – The processing mode, influencing default regex settings and behavior. Supported modes include ‘ORCA’, ‘GPAW’ and ‘VASP’.
- Raises:
ValueError – If an invalid mode is specified.
- create_html(css_content: str | None = None, js_content: str | None = None, insert_css: bool | None = True, insert_js: bool | None = True, insert_left_sidebar: bool | None = True, insert_colorcomment_sidebar: bool | None = True, show_progress: bool | None = False) str
Constructs a complete HTML document from processed text, integrating optional CSS and JavaScript content.
- Parameters:
css_content (Optional[str], optional) – Custom CSS to be included in the HTML document. Defaults to predefined CSS if not provided.
js_content (Optional[str], optional) – Custom JavaScript to be included in the HTML document. Defaults to predefined JavaScript if not provided.
insert_css (Optional[bool], optional) – Determines whether to include CSS content in the HTML document.
insert_js (Optional[bool], optional) – Determines whether to include JavaScript content in the HTML document.
insert_left_sidebar (Optional[bool], optional) – Specifies whether to include a left sidebar for the Table of Contents (TOC) in the HTML document.
insert_colorcomment_sidebar (Optional[bool], optional) – Specifies whether to include a comment sidebar for additional annotations in the HTML document.
show_progress (Optional[bool], optional) – Specifies whether to display a progress bar during operation.
- Returns:
The complete HTML document as a string.
- Return type:
str
- depth() int
Calculates the maximum depth of nested structures within the File instance.
- Returns:
The maximum depth of nested elements’ structures.
- Return type:
int
- static extract_data_errors_to_none(orca_element: Element) Data | None
Tries to extract data from an Element, handling errors by returning None.
This method encapsulates error handling during data extraction from an Element. If an error occurs, the issue is logged, and None is returned.
- static extract_raw_data_errors_to_none(orca_element: Element) str | None
Tries to extract raw data from an Element, returning None in case of errors.
This method is designed to handle errors gracefully during the extraction of raw data from an Element. If an error occurs, a warning is issued and None is returned.
- Parameters:
orca_element (Element) – An instance of Element from which raw data is to be extracted.
- Returns:
The extracted raw data from the Element, or None if an error occurred.
- Return type:
str | None
- get_blocks(show_progress: bool | None = False) DataFrame
Retrieves all processed blocks as a DataFrame, ensuring the file has been initialized.
- Parameters:
show_progress (Optional[bool], optional) – Optionally displays a progress bar during initialization.
- Returns:
A DataFrame containing processed blocks with their metadata.
- Return type:
pd.DataFrame
- get_data(extract_only_raw: bool | None = False, element_type: type[Element] | None = None, readable_name: str | None = None, raw_data_substring: str | Iterable[str] | None = None, raw_data_not_substring: str | Iterable[str] | None = None, show_progress: bool | None = False) DataFrame
Retrieves and extracts data from Element instances based on search criteria, with an option to extract raw or processed data.
- Parameters:
extract_only_raw (Optional[bool], optional) – If True, only raw data will be extracted, bypassing any custom data extraction logic defined in Element subclasses.
element_type (Optional[type[Element]], optional) – The type of Element to filter by; only elements of this type will be considered.
readable_name (Optional[str], optional) – A filter for elements that have this exact readable_name.
raw_data_substring (Optional[str | Iterable[str]], optional) – A filter for elements whose raw_data contains this substring.
raw_data_not_substring (Optional[str | Iterable[str]], optional) – A filter for elements whose raw_data does not contain this substring.
show_progress (Optional[bool], optional) – If True, displays a progress bar during the operation.
- Returns:
A DataFrame of the filtered elements with their extracted data.
- Return type:
pd.DataFrame
- get_marked_text(show_progress: bool | None = False) list[tuple[tuple[int, int], tuple[int, int], str | Element]]
Retrieves the text segments with associated markers after processing patterns, ensuring the file has been initialized.
- Parameters:
show_progress (Optional[bool], optional) – Optionally displays a progress bar during initialization.
- Returns:
A list of text segments marked with their character and line positions, alongside the corresponding text or Element object.
- Return type:
list[tuple[tuple[int, int], tuple[int, int], str | Element]]
- get_structure() dict[Self, list]
Retrieves the hierarchical structure of the File instance, representing the organization of processed elements.
- Returns:
A dictionary mapping the File instance to a list of its elements’ structures.
- Return type:
dict[Self, list]
- initialize(show_progress: bool | None = False) None
Initializes the File instance by processing patterns, if not already done, to identify and categorize text segments.
- Parameters:
show_progress (Optional[bool], optional) – Optionally displays a progress bar during the pattern processing phase.
- process_patterns(show_progress: bool | None = False) None
Identifies and categorizes text segments based on predefined regex patterns, updating the internal storage of blocks and marked text.
- Parameters:
show_progress (Optional[bool], optional) – Optionally displays a progress bar during the processing of regex patterns.
- save_as_html(output_file_path: str, insert_css: bool | None = True, insert_js: bool | None = True, insert_left_sidebar: bool | None = True, insert_colorcomment_sidebar: bool | None = True, show_progress: bool | None = False)
Generates and saves an HTML document based on the processed content of the File instance, with customizable display options.
This method leverages create_html to construct the HTML content, including optional CSS and JavaScript, as well as sidebars for navigation and comments. The complete HTML is then saved to the specified file path.
- Parameters:
output_file_path (str) – The file path, including the name and extension, where the HTML document will be saved. Existing files will be overwritten.
insert_css (Optional[bool], optional) – If True, includes CSS content in the HTML document for styling. Defaults to True.
insert_js (Optional[bool], optional) – If True, includes JavaScript content in the HTML document for interactivity. Defaults to True.
insert_left_sidebar (Optional[bool], optional) – If True, includes a left sidebar in the HTML document, typically used for a Table of Contents (TOC). Defaults to True.
insert_colorcomment_sidebar (Optional[bool], optional) – If True, includes a sidebar for additional annotations or comments in the HTML document. Defaults to True.
show_progress (Optional[bool], optional) – If True, displays a progress indicator during the HTML content generation process. Defaults to False.
- Note:
This method allows exporting the processed content to an HTML format, facilitating viewing in web browsers or further processing with HTML-compatible tools. The inclusion of CSS and JavaScript enhances the document’s appearance and interactivity, while optional sidebars provide navigation and annotation capabilities.
- search_elements(element_type: type[Element] | None = None, readable_name: str | None = None, raw_data_substring: str | Iterable[str] | None = None, raw_data_not_substring: str | Iterable[str] | None = None, show_progress: bool = False) DataFrame
Searches for Element instances based on specified criteria, such as element type, readable name, and raw data content.
- Parameters:
element_type (type[Element] | None, optional) – The class type of Element to search for, if filtering by type.
readable_name (str | None, optional) – The exact term to search for in the readable_name attribute of Element.
raw_data_substring (str | Iterable[str] | None, optional) – The substring(s) to search for within the raw_data attribute of Element.
raw_data_not_substring (str | Iterable[str] | None, optional) – The substring(s) whose absence within the raw_data attribute is required.
show_progress (bool, optional) – Whether to display a progress bar during initialization.
- Returns:
A DataFrame containing filtered Element instances based on the provided criteria.
- Return type:
pd.DataFrame
pychemparse.orca_elements module
- class pychemparse.orca_elements.AvailableBlocksOrca
Bases:
AvailableBlocksGeneral
A class to store all available blocks for ORCA.
- blocks: dict[str, type[Element]] = {'BlockOrcaAbsorptionSpectrumViaTransitionElectricDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionElectricDipoleMoments'>, 'BlockOrcaAbsorptionSpectrumViaTransitionVelocityDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionVelocityDipoleMoments'>, 'BlockOrcaAcknowledgement': <class 'pychemparse.orca_elements.BlockOrcaAcknowledgement'>, 'BlockOrcaAllRightsReserved': <class 'pychemparse.orca_elements.BlockOrcaAllRightsReserved'>, 'BlockOrcaAuxJBasis': <class 'pychemparse.orca_elements.BlockOrcaAuxJBasis'>, 'BlockOrcaCdSpectrum': <class 'pychemparse.orca_elements.BlockOrcaCdSpectrum'>, 'BlockOrcaCdSpectrumViaTransitionElectricDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionElectricDipoleMoments'>, 'BlockOrcaCdSpectrumViaTransitionVelocityDipoleMoments': <class 'pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionVelocityDipoleMoments'>, 'BlockOrcaCiNebConvergence': <class 'pychemparse.orca_elements.BlockOrcaCiNebConvergence'>, 'BlockOrcaContributions': <class 'pychemparse.orca_elements.BlockOrcaContributions'>, 'BlockOrcaDipoleMoment': <class 'pychemparse.orca_elements.BlockOrcaDipoleMoment'>, 'BlockOrcaErrorMessage': <class 'pychemparse.orca_elements.BlockOrcaErrorMessage'>, 'BlockOrcaFinalSinglePointEnergy': <class 'pychemparse.orca_elements.BlockOrcaFinalSinglePointEnergy'>, 'BlockOrcaGeometryConvergence': <class 'pychemparse.orca_elements.BlockOrcaGeometryConvergence'>, 'BlockOrcaHurrayCI': <class 'pychemparse.orca_elements.BlockOrcaHurrayCI'>, 'BlockOrcaHurrayOptimization': <class 'pychemparse.orca_elements.BlockOrcaHurrayOptimization'>, 'BlockOrcaHurrayTS': <class 'pychemparse.orca_elements.BlockOrcaHurrayTS'>, 'BlockOrcaIcon': <class 'pychemparse.orca_elements.BlockOrcaIcon'>, 'BlockOrcaInputFile': <class 'pychemparse.orca_elements.BlockOrcaInputFile'>, 'BlockOrcaLibXc': <class 'pychemparse.orca_elements.BlockOrcaLibXc'>, 'BlockOrcaLibint2': <class 'pychemparse.orca_elements.BlockOrcaLibint2'>, 'BlockOrcaOrbitalBasis': <class 'pychemparse.orca_elements.BlockOrcaOrbitalBasis'>, 'BlockOrcaOrbitalEnergies': <class 'pychemparse.orca_elements.BlockOrcaOrbitalEnergies'>, 'BlockOrcaPathSummaryForNebCi': <class 'pychemparse.orca_elements.BlockOrcaPathSummaryForNebCi'>, 'BlockOrcaPathSummaryForNebTs': <class 'pychemparse.orca_elements.BlockOrcaPathSummaryForNebTs'>, 'BlockOrcaRotationalSpectrum': <class 'pychemparse.orca_elements.BlockOrcaRotationalSpectrum'>, 'BlockOrcaScf': <class 'pychemparse.orca_elements.BlockOrcaScf'>, 'BlockOrcaScfConverged': <class 'pychemparse.orca_elements.BlockOrcaScfConverged'>, 'BlockOrcaScfType': <class 'pychemparse.orca_elements.BlockOrcaScfType'>, 'BlockOrcaShark': <class 'pychemparse.orca_elements.BlockOrcaShark'>, 'BlockOrcaSoscf': <class 'pychemparse.orca_elements.BlockOrcaSoscf'>, 'BlockOrcaSpectrumType': <class 'pychemparse.orca_elements.BlockOrcaSpectrumType'>, 'BlockOrcaTddftExcitedStatesSinglets': <class 'pychemparse.orca_elements.BlockOrcaTddftExcitedStatesSinglets'>, 'BlockOrcaTddftTdaExcitedStates': <class 'pychemparse.orca_elements.BlockOrcaTddftTdaExcitedStates'>, 'BlockOrcaTerminatedNormally': <class 'pychemparse.orca_elements.BlockOrcaTerminatedNormally'>, 'BlockOrcaTimingsForIndividualModules': <class 'pychemparse.orca_elements.BlockOrcaTimingsForIndividualModules'>, 'BlockOrcaTotalRunTime': <class 'pychemparse.orca_elements.BlockOrcaTotalRunTime'>, 'BlockOrcaTotalScfEnergy': <class 'pychemparse.orca_elements.BlockOrcaTotalScfEnergy'>, 'BlockOrcaUnrecognizedHurray': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedHurray'>, 'BlockOrcaUnrecognizedMessage': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedMessage'>, 'BlockOrcaUnrecognizedNotification': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedNotification'>, 'BlockOrcaUnrecognizedScf': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedScf'>, 'BlockOrcaUnrecognizedWithHeader': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedWithHeader'>, 'BlockOrcaUnrecognizedWithSingeLineHeader': <class 'pychemparse.orca_elements.BlockOrcaUnrecognizedWithSingeLineHeader'>, 'BlockOrcaUses': <class 'pychemparse.orca_elements.BlockOrcaUses'>, 'BlockOrcaVersion': <class 'pychemparse.orca_elements.BlockOrcaVersion'>, 'BlockOrcaVibrationalFrequencies': <class 'pychemparse.orca_elements.BlockOrcaVibrationalFrequencies'>, 'BlockOrcaWarnings': <class 'pychemparse.orca_elements.BlockOrcaWarnings'>, 'BlockOrcaWithStandardHeader': <class 'pychemparse.orca_elements.BlockOrcaWithStandardHeader'>}
- class pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionElectricDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaSpectrumType
Parses the ‘ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS’ block.
- class pychemparse.orca_elements.BlockOrcaAbsorptionSpectrumViaTransitionVelocityDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaSpectrumType
Parses the ‘ABSORPTION SPECTRUM VIA TRANSITION VELOCITY DIPOLE MOMENTS’ block.
- class pychemparse.orca_elements.BlockOrcaAcknowledgement(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaAllRightsReserved(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores All rights reserved message from ORCA output files.
Example of ORCA Output:
####################################################### # -***- # # Department of theory and spectroscopy # # Directorship and core code : Frank Neese # # Max Planck Institute fuer Kohlenforschung # # Kaiser Wilhelm Platz 1 # # D-45470 Muelheim/Ruhr # # Germany # # # # All rights reserved # # -***- # #######################################################
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaAuxJBasis(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaCdSpectrum(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaSpectrumType
Parses the ‘CD SPECTRUM’ block.
- class pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionElectricDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaSpectrumType
Parses the ‘CD SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS’ block.
- class pychemparse.orca_elements.BlockOrcaCdSpectrumViaTransitionVelocityDipoleMoments(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaSpectrumType
Parses the ‘CD SPECTRUM VIA TRANSITION VELOCITY DIPOLE MOMENTS’ block.
- class pychemparse.orca_elements.BlockOrcaCiNebConvergence(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores CI-NEB convergence data from ORCA output files.
Example of ORCA Output:
.--------------------. ----------------------| CI-Neb convergence |------------------------- Item value Tolerance Converged --------------------------------------------------------------------- RMS(Fp) 0.0002797716 0.0100000000 YES MAX(|Fp|) 0.0014572463 0.0200000000 YES RMS(FCI) 0.0001842330 0.0010000000 YES MAX(|FCI|) 0.0005858110 0.0020000000 YES ---------------------------------------------------------------------
- data() Data
Returns a
pychemparse.data.Data
object containing:pandas.DataFrame
Data with columns Item, Value, Tolerance, Converged.str
Comment.
Parsed data example:
'Data': Item Value Tolerance Converged 0 RMS(Fp) 0.000188 0.010 YES 1 MAX(|Fp|) 0.000727 0.020 YES 2 RMS(FCI) 0.000212 0.001 YES 3 MAX(|FCI|) 0.000644 0.002 YES
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- class pychemparse.orca_elements.BlockOrcaContributions(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaDipoleMoment(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
The block captures and stores Dipole moment from ORCA output files.
Example of ORCA Output:
------------- DIPOLE MOMENT ------------- X Y Z Electronic contribution: 0.00000 0.00000 4.52836 Nuclear contribution : 0.00000 0.00000 -8.26530 ----------------------------------------- Total Dipole Moment : 0.00000 0.00000 -3.73694 ----------------------------------------- Magnitude (a.u.) : 3.73694 Magnitude (Debye) : 9.49854
or
------------- DIPOLE MOMENT ------------- Method : SCF Type of density : Electron Density Multiplicity : 1 Irrep : 0 Energy : -379.2946629874107884 Eh Relativity type : Basis : AO X Y Z Electronic contribution: -0.000041430 0.000000017 4.661630904 Nuclear contribution : 0.000000009 0.000000000 -8.265300471 ----------------------------------------- Total Dipole Moment : -0.000041422 0.000000017 -3.603669567 ----------------------------------------- Magnitude (a.u.) : 3.603669567 Magnitude (Debye) : 9.159800098
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pint.Quantity
’s withnumpy.ndarray
’s of contributionspint.Quantity
Total Dipole Moment withnumpy.ndarray
’s of contributions in a.u.pint.Quantity
Magnitude (a.u.) – total dipole moment. The magnitude in a.u. can be extracted frompint.Quantity
with .magnitude property.pint.Quantity
Magnitude (Debye) – total dipole moment. The magnitude in Debye can be extracted frompint.Quantity
with .magnitude property.
Parsed data example:
{ 'Electronic contribution': <Quantity([0. 0. 5.37241], 'bohr * elementary_charge')>, 'Nuclear contribution': <Quantity([ 0. 0. -8.2653], 'bohr * elementary_charge')>, 'Total Dipole Moment': <Quantity([ 0. 0. -2.89289], 'bohr * elementary_charge')>, 'Magnitude (a.u.)': <Quantity(2.89289, 'bohr * elementary_charge')>, 'Magnitude (Debye)': <Quantity(7.35314, 'debye')> }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaErrorMessage(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores ORCA error message from ORCA output files.
Example of ORCA Output:
---------------------------------------------------------------------------- ERROR !!! The TS optimization did not converge but reached the maximum number of optimization cycles. As a subsequent Frequencies calculation has been requested ORCA will abort at this point of the run. ----------------------------------------------------------------------------
- data() Data
- Returns:
pychemparse.data.Data
object that contains:str
for the Error message if present
Parsed data example:
{'Error': 'ERROR !!! The optimization did not converge but reached the maximum number of optimization cycles. As a subsequent Frequencies calculation has been requested ORCA will abort at this point of the run. Please restart the calculation with the lowest energy geometry and/or a larger maxiter for the geometry optimization.'}
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaFinalSinglePointEnergy(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Final single point energy from ORCA output files.
Example of ORCA Output:
------------------------- -------------------- FINAL SINGLE POINT ENERGY -379.259324337759 ------------------------- --------------------
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pint.Quantity
Energy
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaGeometryConvergence(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Total run time from ORCA output files.
Example of ORCA Output:
.--------------------. ----------------------|Geometry convergence|------------------------- Item value Tolerance Converged --------------------------------------------------------------------- Energy change 0.0000035570 0.0000050000 YES RMS gradient 0.0000436223 0.0001000000 YES MAX gradient 0.0002094156 0.0003000000 YES RMS step 0.0022222022 0.0020000000 NO MAX step 0.0170204003 0.0040000000 NO ........................................................ Max(Bonds) 0.0003 Max(Angles) 0.02 Max(Dihed) 0.98 Max(Improp) 0.00 ---------------------------------------------------------------------
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pandas.DataFrame
Geometry convergence data
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaHurray(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- class pychemparse.orca_elements.BlockOrcaHurrayCI(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaHurray
- class pychemparse.orca_elements.BlockOrcaHurrayOptimization(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaHurray
- class pychemparse.orca_elements.BlockOrcaHurrayTS(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaHurray
- class pychemparse.orca_elements.BlockOrcaIcon(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores All rights reserved message from ORCA output files.
Example of ORCA Output:
#, ### #### ##### ###### ########, ,,################,,,,, ,,#################################,, ,,##########################################,, ,#########################################, ''#####, ,#############################################,, '####, ,##################################################,,,,####, ,###########'''' ''''############################### ,#####'' ,,,,##########,,,, '''####''' '#### ,##' ,,,,###########################,,, '## ' ,,###'''' '''############,,, ,,##'' '''############,,,, ,,,,,,###'' ,#'' '''#######################''' ' ''''####'''' ,#######, #######, ,#######, ## ,#' '#, ## ## ,#' '#, #''# ###### ,####, ## ## ## ,#' ## #' '# # #' '# ## ## ####### ## ,######, #####, # # '#, ,#' ## ## '#, ,#' ,# #, ## #, ,# '#######' ## ## '#######' #' '# #####' # '####'
- data_available: bool = True
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- class pychemparse.orca_elements.BlockOrcaInputFile(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaLibXc(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaLibint2(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaOrbitalBasis(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaOrbitalEnergies(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
The block captures and stores orbital energies and occupation numbers from ORCA output files.
Example of ORCA Output:
---------------- ORBITAL ENERGIES ---------------- NO OCC E(Eh) E(eV) 0 2.0000 -14.038014 -381.9938 1 2.0000 -13.986101 -380.5812 2 2.0000 -0.200360 -5.4521 3 0.0000 -0.065149 -1.7728 4 0.0000 -0.060749 -1.6531
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pandas.DataFrame
Orbitalsthat includes the columns NO, OCC, E(Eh), and E(eV). The E(Eh) and E(eV) columns represent the same energy values in different units (Hartree and electronvolts, respectively). These values are extracted from the output file and should match unless there’s an error in the ORCA output.
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaPathSummaryForNebCi(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaPathSummaryForNebTs
The block captures and stores NEB-TS path summary data from ORCA output files.
Example of ORCA Output:
--------------------------------------------------------------- PATH SUMMARY --------------------------------------------------------------- All forces in Eh/Bohr. Image Dist.(Ang.) E(Eh) dE(kcal/mol) max(|Fp|) RMS(Fp) 0 0.000 -1040.28151 0.00 0.00024 0.00008 1 4.329 -1040.26830 8.29 0.00103 0.00025 2 6.607 -1040.25791 14.81 0.00120 0.00029 3 8.283 -1040.25022 19.64 0.00174 0.00042 4 9.599 -1040.24240 24.54 0.00116 0.00026 5 10.780 -1040.23790 27.37 0.00047 0.00015 <= CI 6 12.215 -1040.24200 24.80 0.00098 0.00026 7 13.815 -1040.25258 18.16 0.00076 0.00021 8 16.040 -1040.26419 10.87 0.00043 0.00013 9 19.933 -1040.27575 3.62 0.00012 0.00004 Straight line distance between images along the path: D( 0- 1) = 4.3288 Ang. D( 1- 2) = 2.2782 Ang. D( 2- 3) = 1.6757 Ang. D( 3- 4) = 1.3168 Ang. D( 4- 5) = 1.1801 Ang. D( 5- 6) = 1.4358 Ang. D( 6- 7) = 1.5995 Ang. D( 7- 8) = 2.2254 Ang. D( 8- 9) = 3.8933 Ang.
- class pychemparse.orca_elements.BlockOrcaPathSummaryForNebTs(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
The block captures and stores NEB-TS path summary data from ORCA output files.
Example of ORCA Output:
--------------------------------------------------------------- PATH SUMMARY FOR NEB-TS --------------------------------------------------------------- All forces in Eh/Bohr. Global forces for TS. Image E(Eh) dE(kcal/mol) max(|Fp|) RMS(Fp) 0 -1040.28151 0.00 0.00024 0.00008 1 -1040.26641 9.48 0.00357 0.00076 2 -1040.25443 17.00 0.00387 0.00111 3 -1040.24519 22.79 0.00279 0.00095 4 -1040.23692 27.98 0.00459 0.00133 5 -1040.23342 30.18 0.00189 0.00067 <= CI TS -1040.23850 26.99 0.00022 0.00005 <= TS 6 -1040.23665 28.15 0.00216 0.00079 7 -1040.24833 20.82 0.00200 0.00076 8 -1040.26217 12.14 0.00200 0.00058 9 -1040.27575 3.62 0.00012 0.00004
- data() Data
- Returns:
pychemparse.data.Data
object that contains: - (int
) states as keys, and their respective details as sub-dictionaries. The Energy (eV) values are stored aspint.Quantity
. The Transitions are stored in alist
, with each transition represented as adict
containing the From Orbital (str
: number+a|b), To Orbital (str
: number+a|b), and Coefficient (float
).Parsed data example:
{'Data': Image E(Eh) dE(kcal/mol) 0 0 -1040.28151 electron_volt 0.0 kilocalorie / mole 1 1 -1040.27082 electron_volt 6.71 kilocalorie / mole 2 2 -1040.2608 electron_volt 13.0 kilocalorie / mole 3 3 -1040.2518 electron_volt 18.64 kilocalorie / mole 4 4 -1040.24453 electron_volt 23.21 kilocalorie / mole 5 5 -1040.24169 electron_volt 24.99 kilocalorie / mole 6 TS -1040.24272 electron_volt 24.34 kilocalorie / mole 7 6 -1040.24575 electron_volt 22.44 kilocalorie / mole 8 7 -1040.25472 electron_volt 16.81 kilocalorie / mole 9 8 -1040.26597 electron_volt 9.75 kilocalorie / mole 10 9 -1040.27575 electron_volt 3.62 kilocalorie / mole max(|Fp|) RMS(Fp) Comment 0 0.00023 hartree / bohr 7e-05 hartree / bohr 1 0.00068 hartree / bohr 0.00023 hartree / bohr 2 0.00072 hartree / bohr 0.00023 hartree / bohr 3 0.00073 hartree / bohr 0.00022 hartree / bohr 4 0.00067 hartree / bohr 0.0002 hartree / bohr 5 0.00063 hartree / bohr 0.00021 hartree / bohr <= CI 6 7e-05 hartree / bohr 2e-05 hartree / bohr <= TS 7 0.00058 hartree / bohr 0.00021 hartree / bohr 8 0.00055 hartree / bohr 0.00019 hartree / bohr 9 0.00065 hartree / bohr 0.00019 hartree / bohr 10 0.00018 hartree / bohr 5e-05 hartree / bohr }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaRotationalSpectrum(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaScf(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaScfType
The block captures and stores SCF data from ORCA output files.
Example of ORCA Output:
-------------------------------------S-C-F--------------------------------------- Iteration Energy (Eh) Delta-E RMSDP MaxDP Damp Time(sec) --------------------------------------------------------------------------------- *** Starting incremental Fock matrix formation *** *** Initializing SOSCF *** *** Constraining orbitals *** *** Switching to L-BFGS *** Constrained orbitals (energetic order) 30 31 Constrained orbitals (compact order) 31 30
- data() Data
Returns a
pychemparse.data.Data
object containing:pandas.DataFrame
Data with columns Iteration, Energy (Eh), Delta-E, RMSDP, MaxDP, Damp, Time(sec):Time(sec) is represented as a timedelta object.
Energy (Eh) is represented by a pint object. Magnitude can be extracted with the .magnitude method.
pandas.DataFrame
Comments with columns Iteration and Comment.str
Name of the block.
Parsed data example:
{'Data': Empty DataFrame Columns: [Iteration, Energy ( Eh), Delta-E, RMSDP, MaxDP, Damp, Time(sec)] Index: [], 'Comments': Iteration Comment 0 0 *** Starting incremental Fock matrix formatio... 1 0 *** Initializing SOSCF *** 2 0 *** Constraining orbitals *** 3 0 *** Switching to L-BFGS ***, 'Name': 'S-C-F'}
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaScfConverged(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores SCF convergence message from ORCA output files.
Example of ORCA Output:
***************************************************** * SUCCESS * * SCF CONVERGED AFTER 20 CYCLES * *****************************************************
- data() Data
- Returns:
pychemparse.data.Data
object that contains:bool
for Success of the extractionint
for amount of Cycles
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaScfType(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores SCF data from ORCA output files.
Example of ORCA Output:
-------------------------------------S-C-F--------------------------------------- Iteration Energy (Eh) Delta-E RMSDP MaxDP Damp Time(sec) --------------------------------------------------------------------------------- *** Starting incremental Fock matrix formation *** *** Initializing SOSCF *** *** Constraining orbitals *** *** Switching to L-BFGS *** Constrained orbitals (energetic order) 30 31 Constrained orbitals (compact order) 31 30
or
---------------------------------------S-O-S-C-F-------------------------------------- Iteration Energy (Eh) Delta-E RMSDP MaxDP MaxGrad Time(sec) -------------------------------------------------------------------------------------- 1 -379.2796837014277571 0.00e+00 0.00e+00 0.00e+00 3.00e-02 0.3 *** Restarting incremental Fock matrix formation *** 2 -379.2796837014277571 0.00e+00 5.39e-03 2.36e-01 3.00e-02 0.3 3 -379.2788786204820326 8.05e-04 2.96e-03 1.30e-01 3.68e-02 0.3 4 -379.2897810987828962 -1.09e-02 1.69e-03 1.37e-01 9.46e-03 0.2 5 -379.2878642728886689 1.92e-03 8.10e-04 7.42e-02 1.68e-02 0.2 6 -379.2909711775516826 -3.11e-03 7.04e-04 3.11e-02 4.12e-03 0.2 ***Gradient convergence achieved*** *** Unconstraining orbitals *** *** Restarting Hessian update and switching to L-SR1 *** 7 -379.2904844538218185 4.87e-04 1.27e-03 4.23e-02 1.72e-02 0.2 8 -379.2892451088814596 1.24e-03 9.18e-04 7.40e-02 2.70e-02 0.3 9 -379.2943354063930883 -5.09e-03 3.93e-04 1.30e-02 2.96e-03 0.2 10 -379.2945957143243731 -2.60e-04 2.02e-04 7.28e-03 1.37e-03 0.2 11 -379.2946565737383935 -6.09e-05 7.77e-04 4.26e-02 4.79e-04 0.2 12 -379.2946442625134296 1.23e-05 3.75e-03 2.20e-01 9.99e-04 0.2 13 -379.2946572622200847 -1.30e-05 8.03e-04 4.98e-02 7.21e-04 0.2 14 -379.2946626618473829 -5.40e-06 4.17e-04 2.11e-02 1.05e-04 0.2 15 -379.2946629954453783 -3.34e-07 1.04e-03 5.09e-02 4.80e-05 0.2 ***Gradient convergence achieved***
- extract_name_header_and_body() tuple[str, str | None, str]
Identifies and separates the name, header, and body of the block based on a SCF header format.
Utilizes regular expressions to discern the header portion from the body, processing the header to extract a distinct name and the header content. The text following the header is treated as the body of the block.
Returns
- tuple[str, str | None, str]
The name of the block, the header content (or None if a header is not present), and the body of the block.
- class pychemparse.orca_elements.BlockOrcaShark(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaSoscf(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaScf
The block captures and stores SOSCF data from ORCA output files.
Example of ORCA Output:
---------------------------------------S-O-S-C-F-------------------------------------- Iteration Energy (Eh) Delta-E RMSDP MaxDP MaxGrad Time(sec) -------------------------------------------------------------------------------------- 1 -379.2796837014277571 0.00e+00 0.00e+00 0.00e+00 3.00e-02 0.3 *** Restarting incremental Fock matrix formation *** 2 -379.2796837014277571 0.00e+00 5.39e-03 2.36e-01 3.00e-02 0.3 3 -379.2788786204820326 8.05e-04 2.96e-03 1.30e-01 3.68e-02 0.3 4 -379.2897810987828962 -1.09e-02 1.69e-03 1.37e-01 9.46e-03 0.2 5 -379.2878642728886689 1.92e-03 8.10e-04 7.42e-02 1.68e-02 0.2 6 -379.2909711775516826 -3.11e-03 7.04e-04 3.11e-02 4.12e-03 0.2 ***Gradient convergence achieved*** *** Unconstraining orbitals *** *** Restarting Hessian update and switching to L-SR1 *** 7 -379.2904844538218185 4.87e-04 1.27e-03 4.23e-02 1.72e-02 0.2 8 -379.2892451088814596 1.24e-03 9.18e-04 7.40e-02 2.70e-02 0.3 9 -379.2943354063930883 -5.09e-03 3.93e-04 1.30e-02 2.96e-03 0.2 10 -379.2945957143243731 -2.60e-04 2.02e-04 7.28e-03 1.37e-03 0.2 11 -379.2946565737383935 -6.09e-05 7.77e-04 4.26e-02 4.79e-04 0.2 12 -379.2946442625134296 1.23e-05 3.75e-03 2.20e-01 9.99e-04 0.2 13 -379.2946572622200847 -1.30e-05 8.03e-04 4.98e-02 7.21e-04 0.2 14 -379.2946626618473829 -5.40e-06 4.17e-04 2.11e-02 1.05e-04 0.2 15 -379.2946629954453783 -3.34e-07 1.04e-03 5.09e-02 4.80e-05 0.2 ***Gradient convergence achieved***
- data() Data
Returns a
pychemparse.data.Data
object containing:pandas.DataFrame
Data with columns Iteration, Energy (Eh), Delta-E, RMSDP, MaxDP, Damp, Time(sec):Time(sec) is represented as a timedelta object.
Energy (Eh) is represented by a pint object. Magnitude can be extracted with the .magnitude method.
pandas.DataFrame
Comments with columns Iteration and Comment.str
Name of the block.
Parsed data example:
{'Data': Iteration Energy (Eh) Delta-E RMSDP MaxDP MaxGrad Time(sec) 0 1 -440.42719635301455 hartree 0.000000e+00 0.000000 0.0000 0.029500 0 days 00:00:00.500000 1 2 -440.42719635301455 hartree 0.000000e+00 0.004710 0.2320 0.029500 0 days 00:00:00.400000 2 3 -440.49687163902 hartree -6.970000e-02 0.012600 1.1300 0.011100 0 days 00:00:00.400000, 'Comments': Iteration Comment 0 1 *** Restarting incremental Fock matrix formati... 1 13 **** Energy Check signals convergence **** 2 13 *** Unconstraining orbitals *** 3 13 *** Restarting Hessian update and switching to... 4 21 *** Restarting incremental Fock matrix formati... 5 33 **** Energy Check signals convergence ****, 'Name': 'S-O-S-C-F'}
- Return type:
- class pychemparse.orca_elements.BlockOrcaSpectrumType(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores spectrum data from ORCA output files.
Example of ORCA Output:
----------------------------------------------------------------------------- ABSORPTION SPECTRUM VIA TRANSITION ELECTRIC DIPOLE MOMENTS ----------------------------------------------------------------------------- State Energy Wavelength fosc T2 TX TY TZ (cm-1) (nm) (au**2) (au) (au) (au) ----------------------------------------------------------------------------- 1 16903.5 591.6 0.000000000 0.00000 0.00000 -0.00000 -0.00000 2 22365.6 447.1 0.000000000 0.00000 -0.00000 -0.00000 0.00000 3 23649.8 422.8 0.000000000 0.00000 0.00000 0.00000 0.00000 4 25396.9 393.7 0.002096634 0.02718 -0.13602 0.04159 0.08336 5 26409.5 378.7 0.626104251 7.80481 -1.15488 -2.10174 1.43308 6 28468.1 351.3 0.000000000 0.00000 0.00000 0.00000 -0.00000 7 28944.3 345.5 0.000000000 0.00000 0.00000 -0.00000 -0.00000 8 28964.5 345.3 0.000000000 0.00000 -0.00000 0.00000 0.00000 9 29986.3 333.5 0.025998669 0.28543 -0.44658 -0.18107 0.23069 10 30178.3 331.4 0.000000000 0.00000 0.00000 0.00000 -0.00000 11 31055.6 322.0 0.000000000 0.00000 -0.00000 -0.00000 0.00000 12 32047.8 312.0 0.000000000 0.00000 -0.00000 -0.00000 0.00000 13 32343.8 309.2 0.000000000 0.00000 -0.00000 0.00000 -0.00000 14 32365.6 309.0 0.012474234 0.12688 -0.23853 0.23551 -0.12051 15 32454.2 308.1 0.023480417 0.23818 0.00690 -0.48392 0.06292 16 33446.2 299.0 0.001756413 0.01729 -0.06205 0.06809 -0.09382 17 34637.6 288.7 0.000000000 0.00000 0.00000 -0.00000 -0.00000 18 35255.9 283.6 0.000000000 0.00000 0.00000 -0.00000 -0.00000 ...
- data() Data
Parses the spectrum block and returns a Data object containing a DataFrame with units applied.
Returns
- Data
The parsed data.
Parsed data example:
'Data': Transition Energy (eV) Energy (cm-1) 0 0-1A -> 1-3A 2.09817 electron_volt 16922.9 / centimeter 1 0-1A -> 2-3A 2.773662 electron_volt 22371.1 / centimeter 2 0-1A -> 3-3A 2.932123 electron_volt 23649.2 / centimeter 3 0-1A -> 4-1A 3.149596 electron_volt 25403.2 / centimeter 4 0-1A -> 5-1A 3.275905 electron_volt 26422.0 / centimeter .. ... ... ... 95 0-1A -> 96-1A 6.641358 electron_volt 53566.2 / centimeter 96 0-1A -> 97-3A 6.656552 electron_volt 53688.7 / centimeter 97 0-1A -> 98-3A 6.701102 electron_volt 54048.0 / centimeter 98 0-1A -> 99-3A 6.727343 electron_volt 54259.7 / centimeter 99 0-1A -> 100-1A 6.746272 electron_volt 54412.4 / centimeter Wavelength (nm) fosc(D2) D2 (au**2) 0 590.9 nanometer 0.000000 0.0 bohr ** 2 * elementary_charge ** 2 1 447.0 nanometer 0.000000 0.0 bohr ** 2 * elementary_charge ** 2 2 422.8 nanometer 0.000000 0.0 bohr ** 2 * elementary_charge ** 2 3 393.7 nanometer 0.002212 0.02866 bohr ** 2 * elementary_charge ** 2 4 378.5 nanometer 0.624346 7.77921 bohr ** 2 * elementary_charge ** 2 .. ... ... ... 95 186.7 nanometer 0.191542 1.1772 bohr ** 2 * elementary_charge ** 2 96 186.3 nanometer 0.000000 0.0 bohr ** 2 * elementary_charge ** 2 97 185.0 nanometer 0.000000 0.0 bohr ** 2 * elementary_charge ** 2 98 184.3 nanometer 0.000000 0.0 bohr ** 2 * elementary_charge ** 2 99 183.8 nanometer 0.233894 1.41514 bohr ** 2 * elementary_charge ** 2 DX (au) DY (au) 0 0.0 bohr * elementary_charge -0.0 bohr * elementary_charge 1 0.0 bohr * elementary_charge 0.0 bohr * elementary_charge 2 -0.0 bohr * elementary_charge -0.0 bohr * elementary_charge 3 -0.14047 bohr * elementary_charge 0.03394 bohr * elementary_charge 4 -1.15138 bohr * elementary_charge -2.09801 bohr * elementary_charge .. ... ... 95 0.94121 bohr * elementary_charge -0.11391 bohr * elementary_charge 96 0.0 bohr * elementary_charge -0.0 bohr * elementary_charge 97 0.0 bohr * elementary_charge -0.0 bohr * elementary_charge 98 -0.0 bohr * elementary_charge -0.0 bohr * elementary_charge 99 0.60355 bohr * elementary_charge -0.5407 bohr * elementary_charge DZ (au) 0 0.0 bohr * elementary_charge 1 0.0 bohr * elementary_charge 2 0.0 bohr * elementary_charge 3 0.08819 bohr * elementary_charge 4 1.43244 bohr * elementary_charge .. ... 95 0.52758 bohr * elementary_charge 96 0.0 bohr * elementary_charge 97 0.0 bohr * elementary_charge 98 0.0 bohr * elementary_charge 99 0.87092 bohr * elementary_charge
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaTddftExcitations(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
The block captures and stores TD-DFT excited states data for singlets from ORCA output files.
Example of ORCA Output:
-------------------------------- TD-DFT EXCITED STATES (SINGLETS) -------------------------------- the weight of the individual excitations are printed if larger than 0.01 STATE 1: E= 0.154808 au 4.213 eV 33976.3 cm**-1 = 0.000000 29a -> 31a : 0.078253 30a -> 32a : 0.907469
or
------------------------- TD-DFT/TDA EXCITED STATES ------------------------- the weight of the individual excitations are printed if larger than 1.0e-02 UHF/UKS reference: multiplicity estimated based on rounded value, RELEVANCE IS LIMITED! STATE 1: E= 0.077106 au 2.098 eV 16922.9 cm**-1 = 2.000000 Mult 3 90a -> 91a : 0.468442 (c= 0.68442790) 90b -> 91b : 0.468442 (c= -0.68442790) STATE 2: E= 0.101930 au 2.774 eV 22371.1 cm**-1 = 2.000000 Mult 3 89a -> 91a : 0.418245 (c= 0.64671829) 89a -> 92a : 0.050001 (c= -0.22360974) 89b -> 91b : 0.418245 (c= -0.64671829) 89b -> 92b : 0.050001 (c= 0.22360974)
- data() Data
- Returns:
pychemparse.data.Data
object that contains: - (int
) states as keys, and their respective details as sub-dictionaries. The Energy (eV) values are stored aspint.Quantity
. The Transitions are stored in alist
, with each transition represented as adict
containing the From Orbital (str
: number+a|b), To Orbital (str
: number+a|b), and Coefficient (float
).Parsed data example:
{ 1: { 'Energy (eV)': <Quantity(4.647, 'electron_volt')>, 'Transitions': [ {'From Orbital': '29a', 'To Orbital': '32a', 'Coefficient': 0.055845}, {'From Orbital': '30a', 'To Orbital': '31a', 'Coefficient': 0.906577} ] }, # Additional states follow the same structure }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaTddftExcitedStatesSinglets(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaTddftExcitations
- class pychemparse.orca_elements.BlockOrcaTddftTdaExcitedStates(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaTddftExcitations
- class pychemparse.orca_elements.BlockOrcaTerminatedNormally(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Termination status from ORCA output files.
Example of ORCA Output:
****ORCA TERMINATED NORMALLY****
- data() Data
- Returns:
pychemparse.data.Data
object that contains:bool
Termination statusis always True, otherwise you wound`t find this block.
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaTimingsForIndividualModules(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores CI-NEB convergence data from ORCA output files.
Example of ORCA Output:
Timings for individual modules: Sum of individual times ... 509.556 sec (= 8.493 min) GTO integral calculation ... 7.722 sec (= 0.129 min) 1.5 % SCF iterations ... 123.801 sec (= 2.063 min) 24.3 % SCF Gradient evaluation ... 26.450 sec (= 0.441 min) 5.2 % Geometry relaxation ... 0.826 sec (= 0.014 min) 0.2 % Analytical frequency calculation... 350.758 sec (= 5.846 min) 68.8 %
- data() Data
- Returns:
pychemparse.data.Data
object that contains:dict
Timingswith module names as keys and timings as
datetime.timedelta
objects.
- Return type:
Parsed data example:
'Sum of individual times': datetime.timedelta(seconds=24, microseconds=36000), 'GTO integral calculation': datetime.timedelta(seconds=8, microseconds=80000), 'SCF iterations': datetime.timedelta(seconds=15, microseconds=956000)
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- class pychemparse.orca_elements.BlockOrcaTotalRunTime(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Total run time from ORCA output files.
Example of ORCA Output:
TOTAL RUN TIME: 0 days 0 hours 1 minutes 20 seconds 720 msec
- data() Data
- Returns:
pychemparse.data.Data
object that contains:datetime.timedelta
Run Timerepresenting the total run time in days, hours, minutes, seconds, and milliseconds.
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.orca_elements.BlockOrcaTotalScfEnergy(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
The block captures and stores Total SCF Energy from ORCA output files.
Example of ORCA Output:
---------------- TOTAL SCF ENERGY ---------------- Total Energy : -379.43011624 Eh -10324.81837 eV Components: Nuclear Repulsion : 376.82729155 Eh 10253.99191 eV Electronic Energy : -756.25740779 Eh -20578.81027 eV One Electron Energy: -1258.15590029 Eh -34236.16258 eV Two Electron Energy: 501.89849250 Eh 13657.35231 eV Virial components: Potential Energy : -757.03875139 Eh -20600.07171 eV Kinetic Energy : 377.60863515 Eh 10275.25335 eV Virial Ratio : 2.00482373 DFT components: N(Alpha) : 31.000002566977 electrons N(Beta) : 31.000002566977 electrons N(Total) : 62.000005133953 electrons E(X) : -51.506470961700 Eh E(C) : -2.061628237949 Eh E(XC) : -53.568099199649 Eh DFET-embed. en. : 0.000000000000 Eh
- data() Data
- Returns:
pychemparse.data.Data
object that contains:dict
Total Energy with-
pint.Quantity
Value in Eh-
pint.Quantity
Value in eVdict
Components, Virial components, and DFT components (may differ in different versions of ORCA) withdict
subdicts with data.
If data has representation in multiple units, they are stored in the subdicts with the unit as key. Othervise, the value is stored directly in the dict as
pint.Quantity
.It is expected for the values to represent the same quantity, if they do not, there is an error in ORCA.
Output blocks example from ORCA 6:
{ 'Total Energy': {'Value in Eh': <Quantity(-379.430116, 'hartree')>, 'Value in eV': <Quantity(-10324.8184, 'electron_volt')>}, 'Components': {'Nuclear Repulsion': {'Value in Eh': <Quantity(376.827292, 'hartree')>, 'Value in eV': <Quantity(10253.9919, 'electron_volt')>}, 'Electronic Energy': {'Value in Eh': <Quantity(-756.257408, 'hartree')>, 'Value in eV': <Quantity(-20578.8103, 'electron_volt')>}, 'One Electron Energy': {'Value in Eh': <Quantity(-1258.1559, 'hartree')>, 'Value in eV': <Quantity(-34236.1626, 'electron_volt')>}, 'Two Electron Energy': {'Value in Eh': <Quantity(501.898492, 'hartree')>, 'Value in eV': <Quantity(13657.3523, 'electron_volt')>}}, 'Virial components': {'Potential Energy': {'Value in Eh': <Quantity(-757.038751, 'hartree')>, 'Value in eV': <Quantity(-20600.0717, 'electron_volt')>}, 'Kinetic Energy': {'Value in Eh': <Quantity(377.608635, 'hartree')>, 'Value in eV': <Quantity(10275.2534, 'electron_volt')>}, 'Virial Ratio': 2.00482373}, 'DFT components': {'N(Alpha)': <Quantity(31.0000026, 'electron')>, 'N(Beta)': <Quantity(31.0000026, 'electron')>, 'N(Total)': <Quantity(62.0000051, 'electron')>, 'E(X)': <Quantity(-51.506471, 'hartree')>, 'E(C)': <Quantity(-2.06162824, 'hartree')>, 'E(XC)': <Quantity(-53.5680992, 'hartree')>, 'DFET-embed. en.': <Quantity(0.0, 'hartree')>} }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaUnrecognizedHurray(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaHurray
- class pychemparse.orca_elements.BlockOrcaUnrecognizedMessage(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaUnrecognizedNotification(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaUnrecognizedScf(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaScfType
- class pychemparse.orca_elements.BlockOrcaUnrecognizedWithHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
- class pychemparse.orca_elements.BlockOrcaUnrecognizedWithSingeLineHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
- class pychemparse.orca_elements.BlockOrcaUses(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaVersion(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores ORCA version from ORCA output files.
Example of ORCA Output:
Program Version 5.0.0 - RELEASE - (SVN: $Rev: 19529$) ($Date: 2021-06-28 11:36:33 +0200 (Mo, 28 Jun 2021) $)
- data() Data
- Returns:
pychemparse.data.Data
object that contains:str
Version
- Return type:
- data_available: bool = True
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- class pychemparse.orca_elements.BlockOrcaVibrationalFrequencies(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockOrcaWithStandardHeader
The block captures and stores vibrational frequencies data from ORCA output files.
Example of ORCA Output:
----------------------- VIBRATIONAL FREQUENCIES ----------------------- Scaling factor for frequencies = 1.000000000 (already applied!) 0: 0.00 cm**-1 1: 0.00 cm**-1 2: 0.00 cm**-1 3: 0.00 cm**-1 4: 0.00 cm**-1 5: 0.00 cm**-1 6: -15.28 cm**-1 ***imaginary mode*** 7: 32.56 cm**-1 8: 38.76 cm**-1 9: 48.22 cm**-1 10: 89.12 cm**-1 11: 101.15 cm**-1 12: 114.47 cm**-1 13: 135.76 cm**-1
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.orca_elements.BlockOrcaWarnings(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- class pychemparse.orca_elements.BlockOrcaWithStandardHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
Handles blocks with a standard header format by extending the Block class.
This class is designed to process data blocks that come with a standardized header marked by lines of repeating special characters (e.g., ‘-’, ‘*’, ‘#’). It overrides the extract_name_header_and_body method to parse these headers, facilitating the separation of the block into name, header, and body components for easier readability and manipulation.
Parameters
None
Methods
- extract_name_header_and_body()
Parses the block’s content to extract the name, header (if present), and body, adhering to a standard header format.
Raises
- Warning
If the block’s content does not contain a recognizable header, indicating that the format may not conform to expectations.
- extract_name_header_and_body() tuple[str, str | None, str]
Identifies and separates the name, header, and body of the block based on a standard header format.
Utilizes regular expressions to discern the header portion from the body, processing the header to extract a distinct name and the header content. The text following the header is treated as the body of the block.
Returns
- tuple[str, str | None, str]
The name of the block, the header content (or None if a header is not present), and the body of the block.
pychemparse.gpaw_elements module
- class pychemparse.gpaw_elements.AvailableBlocksGpaw
Bases:
AvailableBlocksGeneral
A class to store all available blocks for GPAW.
- blocks: dict[str, type[Element]] = {'BlockGpawConvergedAfter': <class 'pychemparse.gpaw_elements.BlockGpawConvergedAfter'>, 'BlockGpawDipole': <class 'pychemparse.gpaw_elements.BlockGpawDipole'>, 'BlockGpawEnergyContributions': <class 'pychemparse.gpaw_elements.BlockGpawEnergyContributions'>, 'BlockGpawIcon': <class 'pychemparse.gpaw_elements.BlockGpawIcon'>, 'BlockGpawOrbitalEnergies': <class 'pychemparse.gpaw_elements.BlockGpawOrbitalEnergies'>, 'BlockGpawTiming': <class 'pychemparse.gpaw_elements.BlockGpawTiming'>}
- class pychemparse.gpaw_elements.BlockGpawConvergedAfter(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Converged after from GPAW output files.
Example of GPAW Output:
Converged after 12 iterations.
- data() Data
- Returns:
pychemparse.data.Data
object that contains:int
Iterationsbool
Converged is always True, as the block is only extracted if the calculation is converged
Parsed data example:
{'Iterations': 12, 'Converged': True}
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.gpaw_elements.BlockGpawDipole(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Dipole from GPAW output files.
Example of GPAW Output:
Dipole moment: (-0.000000, 0.000000, -1.948262) |e|*Ang
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pint.Quantity
Dipole Moment in |e|*Ang. Can be converted to Debye with.to('D')
.
Parsed data example:
{'Dipole Moment': <Quantity([ 0. -0. -1.128191], 'angstrom * elementary_charge')>}
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.gpaw_elements.BlockGpawEnergyContributions(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Energy contributions from GPAW output files.
Example of GPAW Output:
Energy contributions relative to reference atoms: (reference = -10231.780790) Kinetic: +111.119958 Potential: -114.654058 External: +0.000000 XC: -93.096053 Entropy (-ST): +0.000000 Local: +0.390037 -------------------------- Free energy: -96.240117 Extrapolated: -96.240117
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pint.Quantity
Reference in eVpint.Quantity
Free energy in eVpint.Quantity
Extrapolated in eVdict
Contributions withpint.Quantity
’s. Data is in eV
Parsed data example:
{'Contributions': {'Kinetic': <Quantity(106.291868, 'electron_volt')>, 'Potential': <Quantity(-113.401291, 'electron_volt')>, 'External': <Quantity(0.0, 'electron_volt')>, 'XC': <Quantity(-93.210989, 'electron_volt')>, 'Entropy (-ST)': <Quantity(0.0, 'electron_volt')>, 'Local': <Quantity(0.39059, 'electron_volt')>}, 'Reference': <Quantity(-10231.7808, 'electron_volt')>, 'Free energy': <Quantity(-99.929821, 'electron_volt')>, 'Extrapolated': <Quantity(-99.929821, 'electron_volt')> }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.gpaw_elements.BlockGpawIcon(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
- data_available: bool = True
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- readable_name() str
Generate a readable name for the block based on its content.
Utilizes the extract_name_header_and_body method to derive a name for the block, suitable for display or identification purposes.
- Returns:
The extracted name of the block.
- Return type:
str
- class pychemparse.gpaw_elements.BlockGpawOrbitalEnergies(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Orbitals from GPAW output files.
Example of GPAW Output:
Up Down Band Eigenvalues Occupancy Eigenvalues Occupancy 0 -24.42908 1.00000 -24.57211 1.00000 1 -22.16252 1.00000 -22.18228 1.00000 2 -21.55401 1.00000 -21.60131 1.00000
- data() Data
- Returns:
pychemparse.data.Data
object that contains:pandas.DataFrame
UpDownOrbitals with columns: Band, Eigenvalues_Up, Occupancy_Up, Eigenvalues_Down, Occupancy_Down. Eigenvalues are in eV.
Parsed data example:
{'UpDownOrbitals': Band Eigenvalues_Up Occupancy_Up Eigenvalues_Down Occupancy_Down 0 0 -24.42908 1.0 -24.57211 1.0 1 1 -22.16252 1.0 -22.18228 1.0 2 2 -21.55401 1.0 -21.60131 1.0 3 3 -19.15063 1.0 -19.19201 1.0 4 4 -19.10920 1.0 -19.10168 1.0 .. ... ... ... ... ... 247 247 81.59782 0.0 81.62746 0.0 248 248 81.85757 0.0 81.83158 0.0 249 249 83.60243 0.0 83.51849 0.0 250 250 87.94628 0.0 87.90765 0.0 251 251 95.86929 0.0 95.86901 0.0 [252 rows x 5 columns]}
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.gpaw_elements.BlockGpawTiming(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores Timing from GPAW output files.
Example of GPAW Output:
Timing: incl. excl. ------------------------------------------------------------ Basic WFS set positions: 0.000 0.000 0.0% | Redistribute: 0.000 0.000 0.0% | Basis functions set positions: 0.003 0.003 0.0% | ... ST tci: 0.001 0.001 0.0% | Set symmetry: 0.000 0.000 0.0% | TCI: Evaluate splines: 0.182 0.182 2.4% || mktci: 0.001 0.001 0.0% | Other: 0.803 0.803 10.5% |---| ------------------------------------------------------------ Total: 7.661 100.0%
- data() Data
Parses the timing data, maintains the hierarchy, and extracts the total time separately.
- Returns:
pychemparse.data.Data
object that contains:Total: A dictionary with ‘Total Time’ and ‘Percentage’.
TimingHierarchy: A list of timing data entries maintaining the hierarchy.
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
pychemparse.vasp_elements module
- class pychemparse.vasp_elements.AvailableBlocksVasp
Bases:
AvailableBlocksGeneral
A class to store all available blocks for Wasp.
- blocks: dict[str, type[Element]] = {'BlockVaspFreeEnergyOfTheIonElectronSystem': <class 'pychemparse.vasp_elements.BlockVaspFreeEnergyOfTheIonElectronSystem'>, 'BlockVaspGeneralTiming': <class 'pychemparse.vasp_elements.BlockVaspGeneralTiming'>, 'BlockVaspWarning': <class 'pychemparse.vasp_elements.BlockVaspWarning'>, 'BlockVaspWithSingleLineHeader': <class 'pychemparse.vasp_elements.BlockVaspWithSingleLineHeader'>, 'BlockVaspWithStandardHeader': <class 'pychemparse.vasp_elements.BlockVaspWithStandardHeader'>}
- class pychemparse.vasp_elements.BlockVaspFreeEnergyOfTheIonElectronSystem(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockVaspWithSingleLineHeader
The block captures and stores TD-DFT excited states data for singlets from VASP output files.
Example of VASP Output:
Free energy of the ion-electron system (eV) --------------------------------------------------- alpha Z PSCENC = 856.26359874 Ewald energy TEWEN = 124561.82273922 -Hartree energ DENC = -158586.56090100 -exchange EXHF = 0.00000000 -V(xc)+E(xc) XCENC = 1621.64044307 PAW double counting = 40935.10832877 -40536.82457645 entropy T*S EENTRO = -0.11542442 eigenvalues EBANDS = -6251.33904632 atomic energy EATOM = 37032.80098409 Solvation Ediel_sol = 0.00000000 --------------------------------------------------- free energy TOTEN = -367.20385430 eV energy without entropy = -367.08842988 energy(sigma->0) = -367.14614209
- data() Data
- Returns:
pychemparse.data.Data
object that contains: -pint.Quantity
’s for energy components in eV -tuple
’s ofpint.Quantity
’s for PAW double counting in eV if present in the blockParsed data example:
{'alpha Z PSCENC': 856.26359874 <Unit('electron_volt')>, 'Ewald energy TEWEN': 124531.99989886 <Unit('electron_volt')>, '-Hartree energ DENC': -158146.11578475 <Unit('electron_volt')>, '-exchange EXHF': 0.0 <Unit('electron_volt')>, '-V(xc)+E(xc) XCENC': 1631.52209578 <Unit('electron_volt')>, 'PAW double counting': (29408.33949787 <Unit('electron_volt')>, -29013.20232444 <Unit('electron_volt')>), 'entropy T*S EENTRO': -0.07591362 <Unit('electron_volt')>, 'eigenvalues EBANDS': -1504.38797381 <Unit('electron_volt')>, 'atomic energy EATOM': 37032.80098409 <Unit('electron_volt')>, 'Solvation Ediel_sol': 0.0 <Unit('electron_volt')>, 'free energy TOTEN': 4797.14407872 <Unit('electron_volt')>, 'energy without entropy': 4797.21999234 <Unit('electron_volt')>, 'energy(sigma->0)': 4797.18203553 <Unit('electron_volt')> }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.vasp_elements.BlockVaspGeneralTiming(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores the Timings for the VASP output files.
Example of VASP Output:
General timing and accounting informations for this job: ======================================================== Total CPU time used (sec): 1410.943 User time (sec): 1394.056 System time (sec): 16.888 Elapsed time (sec): 1460.875 Maximum memory used (kb): 201324. Average memory used (kb): N/A Minor page faults: 310377 Major page faults: 212 Voluntary context switches: 5646
- data() Data
- Returns:
pychemparse.data.Data
object that contains: -datetime.timedelta
’s for time components in seconds -bitmath.Byte
’s for memory components in bytes -bitmath.kB
’s for memory components in kilobytes -bitmath.MB
’s for memory components in megabytes -bitmath.GB
’s for memory components in gigabytes -pint.Quantity
’s for other components with units - N/A for non-applicable values - str for other valuesParsed data example:
{'Total CPU time used': datetime.timedelta(seconds=1410, microseconds=943000), 'User time': datetime.timedelta(seconds=1394, microseconds=56000), 'System time': datetime.timedelta(seconds=16, microseconds=888000), 'Elapsed time': datetime.timedelta(seconds=1460, microseconds=875000), 'Maximum memory used': kB(201324.0), 'Average memory used': 'N/A', 'Minor page faults': '310377', 'Major page faults': '212', 'Voluntary context switches': '5646' }
- Return type:
- data_available: bool = True
Formatted data is available for this block.
- class pychemparse.vasp_elements.BlockVaspWarning(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
The block captures and stores waning messages from VASP output files.
Example of VASP Output:
----------------------------------------------------------------------------- | | | W W AA RRRRR N N II N N GGGG !!! | | W W A A R R NN N II NN N G G !!! | | W W A A R R N N N II N N N G !!! | | W WW W AAAAAA RRRRR N N N II N N N G GGG ! | | WW WW A A R R N NN II N NN G G | | W W A A R R N N II N N GGGG !!! | | | | For optimal performance we recommend to set | | NCORE = 2 up to number-of-cores-per-socket | | NCORE specifies how many cores store one orbital (NPAR=cpu/NCORE). | | This setting can greatly improve the performance of VASP for DFT. | | The default, NCORE=1 might be grossly inefficient on modern | | multi-core architectures or massively parallel machines. Do your | | own testing! More info at https://www.vasp.at/wiki/index.php/NCORE | | Unfortunately you need to use the default for GW and RPA | | calculations (for HF NCORE is supported but not extensively tested | | yet). | | | -----------------------------------------------------------------------------
- extract_name_header_and_body() tuple[str, str | None, str]
Extract the block’s name, header, and body components.
Offers a basic implementation for separating the block’s name from its content based on naming conventions. This method is intended to be overridden by subclasses for more specialized extraction logic.
- Returns:
A tuple containing the block’s name, optional header, and body content.
- Return type:
tuple[str, str | None, str]
- class pychemparse.vasp_elements.BlockVaspWithSingleLineHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
BlockVaspWithStandardHeader
- class pychemparse.vasp_elements.BlockVaspWithStandardHeader(raw_data: str, char_position: tuple[int, int] | None = None, line_position: tuple[int, int] | None = None)
Bases:
Block
Handles blocks with a standard header format by extending the Block class.
This class is designed to process data blocks that come with a standardized header marked by lines of repeating special characters (e.g., ‘-’, ‘*’, ‘#’). It overrides the extract_name_header_and_body method to parse these headers, facilitating the separation of the block into name, header, and body components for easier readability and manipulation.
Parameters
None
Methods
- extract_name_header_and_body()
Parses the block’s content to extract the name, header (if present), and body, adhering to a standard header format.
Raises
- Warning
If the block’s content does not contain a recognizable header, indicating that the format may not conform to expectations.
- extract_name_header_and_body() tuple[str, str | None, str]
Identifies and separates the name, header, and body of the block based on a standard header format.
Utilizes regular expressions to discern the header portion from the body, processing the header to extract a distinct name and the header content. The text following the header is treated as the body of the block.
Returns
- tuple[str, str | None, str]
The name of the block, the header content (or None if a header is not present), and the body of the block.
pychemparse.regex_request module
- class pychemparse.regex_request.RegexRequest(p_type: str, p_subtype: str, pattern: str, flags: list[str], comment: str = '')
Bases:
object
Encapsulates a regular expression request for parsing structured text.
This class defines a regular expression pattern along with associated metadata to identify and extract specific elements from text. It allows for the application of the regex pattern to text segments, facilitating the extraction of structured information based on the pattern.
- Variables:
p_type (str) – The general type of the regex request, often corresponding to a high-level category such as ‘Block’ or ‘Element’.
p_subtype (str) – A more specific identifier within the broader type, providing additional context or classification.
pattern (str) – The actual regular expression pattern used for matching text.
flags (int) – The combined regex flags compiled into an integer, determining how the regex pattern is applied.
comment (str) – An optional description or note about the purpose or nature of the regex request.
- apply(marked_text: list[tuple[tuple[int, int], tuple[int, int], Element]] | str, mode: str = 'ORCA', show_progress: bool = False) tuple[str, dict[str, dict]]
Applies the regex pattern to marked text or a raw string to identify and extract elements based on the pattern.
This method iterates over the marked text or processes a string to identify matches to the regex pattern. Extracted elements are then organized based on their positions within the text.
- Parameters:
marked_text (Union[list[tuple[tuple[int, int], tuple[int, int], Element]], str]) – The marked text or raw string to which the regex pattern will be applied. Marked text should be a list of tuples, each containing character positions, line numbers, and an associated Element. If a raw string is provided, it will be converted to the marked text.
mode (str) – The operational mode for element extraction, typically indicating the type of data being processed (e.g., ‘ORCA’ or ‘GPAW’).
show_progress (bool) – Indicates whether a progress indicator should be shown during the extraction process. Useful for long-running operations.
- Returns:
A tuple containing the updated marked text and a dictionary mapping extracted elements to their positions.
- Return type:
tuple[str, dict[str, dict]]
- compile() Pattern
Compiles the regex pattern with the specified flags into a regex pattern object.
This compiled object can be used for various regex operations like findall, search, match, etc., enabling efficient pattern matching.
- Returns:
A compiled regex pattern object, ready for use in pattern matching operations.
- Return type:
Pattern
- to_dict() dict[str, str | list[str]]
Converts the RegexRequest instance to a dictionary, including flag names as strings.
- Returns:
A dictionary representation of the RegexRequest, with keys for type, subtype, pattern, flags (as a list of strings), and optional comment.
- Return type:
dict[str, Union[str, list[str]]]
- validate_configuration() None
Validates the RegexRequest configuration. Placeholder for future validation logic.
Currently, this method does not perform any checks and exists as a placeholder for potential future validation requirements.
pychemparse.regex_settings module
- pychemparse.regex_settings.DEFAULT_GPAW_REGEX_FILE = '/home/runner/work/ChemParse/ChemParse/pychemparse/gpaw_regex.json'
Path to the default GPAW regex settings JSON file, included with the package. :type: str
- pychemparse.regex_settings.DEFAULT_GPAW_REGEX_SETTINGS = RegexSettings(Order: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'], Items: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'])
The pre-loaded RegexSettings instance containing the default regex patterns for GPAW output parsing. :type: RegexSettings
- pychemparse.regex_settings.DEFAULT_ORCA_REGEX_FILE = '/home/runner/work/ChemParse/ChemParse/pychemparse/orca_regex.json'
Path to the default ORCA regex settings JSON file, included with the package. :type: str
- pychemparse.regex_settings.DEFAULT_ORCA_REGEX_SETTINGS = RegexSettings(Order: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'], Items: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'])
The pre-loaded RegexSettings instance containing the default regex patterns for ORCA output parsing. :type: RegexSettings
- pychemparse.regex_settings.DEFAULT_VASP_REGEX_FILE = '/home/runner/work/ChemParse/ChemParse/pychemparse/vasp_regex.json'
Path to the default VASP regex settings JSON file, included with the package. :type: str
- pychemparse.regex_settings.DEFAULT_VASP_REGEX_SETTINGS = RegexSettings(Order: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'], Items: ['TypeKnownBlocks', 'TypeDefaultBlocks', 'Spacer'])
The pre-loaded RegexSettings instance containing the default regex patterns for VASP output parsing. :type: RegexSettings
- class pychemparse.regex_settings.RegexBlueprint(order: list[str], pattern_structure: dict[str, str], pattern_texts: dict[str, str], comment: str)
Bases:
object
A class representing a blueprint for generating multiple RegexRequest objects with a shared structure.
The RegexBlueprint class is useful for defining a common structure for multiple regex patterns that share a similar format. By defining a blueprint with a pattern structure and a set of pattern texts, you can generate multiple RegexRequest objects with the same structure but different text snippets. This is particularly useful when you have a set of related patterns that follow a consistent format but differ in specific details.
- Parameters:
order (list[str]) – The ordered list of keys that defines the sequence of generated RegexRequest objects.
pattern_structure (dict[str, str]) – A dictionary defining the common structure of regex patterns, including beginning, ending, and flags keys.
pattern_texts (dict[str, str]) – A dictionary mapping each key in the order list to a specific text snippet to be inserted into the pattern structure.
comment (str) – A comment or description associated with this blueprint.
Attributes
- order: list[str]
The ordered list of keys that defines the sequence of generated RegexRequest objects.
- pattern_structure: dict[str, str]
A dictionary defining the common structure of regex patterns, including ‘beginning’, ‘ending’, and ‘flags’ keys.
- pattern_texts: dict[str, str]
A dictionary mapping each key in the ‘order’ list to a specific text snippet to be inserted into the pattern structure.
- comment: str
A comment or description associated with this blueprint.
Examples
import pychemparse as chp # Define a blueprint for extracting multiple blocks of text from an ORCA output file rb = chp.RegexBlueprint( order=[ "BlockOrcaVersion", "BlockOrcaContributions", ], pattern_structure={ "beginning": r"^([ \t]*", "ending": r".*?\n(?:^(?!^[ \t]*[\-\*\#\=]{5,}.*$|^[ \t]*$).*\n)*)", "flags": ["MULTILINE"], }, pattern_texts={ "BlockOrcaVersion": "Program Version", "BlockOrcaContributions": "With contributions from" }, comment="Blueprint: Paragraph with the line that starts with specified text.", ) # Generate a list of RegexRequest objects based on the blueprint regex_requests = rb.to_list() # Validate the blueprint configuration rb.validate_configuration()
- add_item(name: str, pattern_text: str) None
Adds a new item to the blueprint, updating the pattern texts, items, and order accordingly.
- Parameters:
name (str) – The key for the new pattern text.
pattern_text (str) – The text snippet to be inserted into the pattern structure for the new item.
- to_dict() dict[str, list[str] | dict[str, str | list[str]] | str]
Converts the RegexBlueprint instance into a dictionary representation.
- Returns:
A dictionary containing the blueprint’s ordered keys, pattern structure, pattern texts, and an optional comment.
- Return type:
dict[str, list[str] | dict[str, str | list[str]] | str]
- to_list() list[RegexRequest]
Converts the blueprint’s items into a list of RegexRequest objects ordered according to the blueprint’s order attribute.
- Returns:
An ordered list of RegexRequest objects generated from the blueprint.
- Return type:
list[RegexRequest]
- tree(depth: int = 0) str
Generates a tree-like string representation of the blueprint, illustrating the hierarchy of pattern structures and texts.
- Parameters:
depth (int) – The initial indentation depth, used to represent hierarchical levels in the output string. The root level starts at 0.
- Returns:
A string visualization of the blueprint, with patterns and texts formatted in a hierarchical, tree-like structure.
- Return type:
str
- validate_configuration() None
Checks the blueprint’s configuration for consistency and correctness, ensuring all necessary components are correctly defined.
This includes verifying the presence of each item in the order within the pattern_texts, the inclusion of required keys in the pattern_structure, the validity of specified regex flags, and the correctness of the regex pattern.
- Raises:
ValueError – If any aspect of the blueprint’s configuration is found to be inconsistent or incorrect.
- class pychemparse.regex_settings.RegexSettings(settings_file: str | None = None, items: dict[str, RegexRequest | RegexSettings] | None = None, order: list[str] | None = None)
Bases:
object
Manages a collection of regex patterns and settings, supporting hierarchical organization and JSON-based configuration.
This class facilitates the organization, storage, and retrieval of regex patterns and their configurations. It can be directly instantiated with regex patterns and an execution order or loaded from a JSON file containing the configurations.
- Variables:
items (dict[str, RegexRequest | RegexSettings]) – A mapping from names to RegexRequest objects or nested RegexSettings, representing individual regex patterns or groups of patterns.
order (list[str]) – The order in which the regex patterns or groups should be applied or processed.
- add_item(name: str, item: RegexRequest | RegexSettings, rewrite: bool = False) None
Adds a new regex pattern or settings group to the RegexSettings instance.
- Parameters:
name (str) – The unique name/key for the new item.
item (Union[RegexRequest, RegexSettings]) – The RegexRequest or RegexSettings instance to be added.
rewrite (bool, optional) – If True, an existing item with the same name will be overwritten. Defaults to False.
- Raises:
ValueError – If an item with the same name already exists and rewrite is False.
- get_ordered_items() list[RegexRequest | RegexSettings]
Retrieves the regex items in the order specified by self.order.
- Returns:
An ordered list of RegexRequest objects and/or RegexSettings instances.
- Return type:
list[RegexRequest | RegexSettings]
- Raises:
ValueError – If the order list references names not present in the items dictionary.
- items: dict[str, RegexRequest | RegexSettings]
- load_settings(settings_file: str) None
Populates the RegexSettings instance with configurations from a specified JSON file.
- Parameters:
settings_file (str) – The file path to the JSON file containing regex configurations.
- order: list[str]
- parse_settings(settings: dict[str, dict | list[str]]) None
Parses a settings dictionary to populate the RegexSettings instance with RegexRequest, RegexBlueprint, or nested RegexSettings.
- Parameters:
settings (dict[str, dict| list[str]]) – A dictionary containing the configuration for regex patterns. It may define RegexRequest objects directly, specify RegexBlueprint configurations, or contain nested RegexSettings.
- save_as_json(filename: str) None
Exports the RegexSettings configuration to a JSON file, preserving the nested structure and order of regex patterns.
- Parameters:
filename (str) – The file path where the JSON representation of the regex settings should be saved.
- set_order(order: list[str]) None
Defines the processing order for the regex items within this RegexSettings instance.
- Parameters:
order (list[str]) – The sequence of item names, determining the order in which items are processed.
- Raises:
ValueError – If any name in the provided order does not correspond to an existing item in self.items.
- to_dict() dict[str, dict | list[str]]
Serializes the RegexSettings instance, including its nested structure, into a dictionary format suitable for JSON serialization.
- Returns:
A dictionary representation of the RegexSettings instance, capturing the order of items and the nested regex configurations.
- Return type:
dict[str, dict| list[str]]
- to_list() list[RegexRequest | RegexSettings]
Converts the RegexSettings instance to a flattened list of RegexRequest objects, including those from nested RegexSettings.
- Returns:
A list containing all RegexRequest objects and RegexSettings instances, expanded in order.
- Return type:
list[RegexRequest | RegexSettings]
- Raises:
TypeError – If an item within self.items is neither a RegexRequest nor a RegexSettings instance.
- tree(depth: int = 0) str
Visualizes the regex settings hierarchy as a tree-like structure, showing the nested organization of patterns and groups.
- Parameters:
depth (int) – The initial indentation level, used to visually represent the depth of nested structures.
- Returns:
A string visualization of the settings hierarchy, formatted as an indented tree structure.
- Return type:
str
- validate_configuration() None
Ensures that each item listed in the order is present in the items dictionary and validates nested configurations.
- Raises:
ValueError – If an ordered item is missing from the items dictionary.
RuntimeWarning – If there are items not included in the order.
pychemparse.scripts module
- pychemparse.scripts.chem_parse(input_file: str, output_file: str, file_format: str = 'auto', readable_name: str | None = None, raw_data_substrings: list[str] = [], raw_data_not_substrings: list[str] = [], mode: str = 'ORCA')
Parses an ORCA (or GPAW) output file and exports filtered data to the specified format.
This function supports exporting to CSV, JSON, HTML, and Excel formats. The output format can be auto-detected based on the file extension of the output path. Data can be filtered by readable names or the presence/absence of specific substrings in the raw data.
- Parameters:
input_file (str) – The path to the ORCA output file to be processed.
output_file (str) – The file path where the exported data will be saved.
file_format (str, optional) – The desired output format (‘auto’, ‘csv’, ‘json’, ‘html’, ‘xlsx’). If ‘auto’, the format is inferred from the output file extension.
readable_name (Optional[str], optional) – Filters elements by their readable name, if specified.
raw_data_substrings (list[str], optional) – Filters elements containing these substrings in their raw data.
raw_data_not_substrings (list[str], optional) – Filters elements not containing these substrings in their raw data.
mode (str, optional) – Specifies the mode of the input file, which can be ‘ORCA’, ‘GPAW’ or ‘VASP’. Default is ‘ORCA’.
- pychemparse.scripts.chem_parse_cli()
Command-line interface for the chem_parse function, allowing users to export data from an ORCA output file from the terminal.
This CLI provides options for specifying the input and output file paths, the desired output format, filtering criteria based on readable names and raw data substrings, and the processing mode.
- pychemparse.scripts.chem_to_html(input_file: str, output_file: str, insert_css: bool = True, insert_js: bool = True, insert_left_sidebar: bool = True, insert_colorcomment_sidebar: bool = True, mode: str = 'ORCA') None
Converts an ORCA (or GPAW) output file to an HTML document with various optional features like CSS, JavaScript, and sidebars.
- Parameters:
input_file (str) – The path to the input file, typically an ORCA output file.
output_file (str) – The destination path where the HTML file will be saved.
insert_css (bool, optional) – If True, includes default CSS styles in the HTML output.
insert_js (bool, optional) – If True, includes JavaScript for interactive elements in the HTML output.
insert_left_sidebar (bool, optional) – If True, adds a left sidebar for navigation in the HTML output.
insert_colorcomment_sidebar (bool, optional) – If True, adds a sidebar for color-coded comments in the HTML output.
mode (str, optional) – Specifies the processing mode, which can be ‘ORCA’, ‘GPAW’ or ‘VASP’. Default is ‘ORCA’.
- pychemparse.scripts.chem_to_html_cli() None
CLI entry point for converting an ORCA or GPAW output file to an HTML document. Parses command-line arguments for input and output file paths and optional features.
This function facilitates the use of the conversion utility from the command line, allowing users to specify the input and output files as well as toggle optional features like CSS, JavaScript, and sidebars via command-line flags.
pychemparse.units_and_constants module
- pychemparse.units_and_constants.ureg = <pint.registry.UnitRegistry object>
Unit registry for pint, use this to define units and constants, do not create a new one